ibm-granite/granite.debug-tools
Python
Captured source
source ↗ibm-granite/granite.debug-tools
Description: Granite Debug Tools
Language: Python
License: Apache-2.0
Stars: 8
Forks: 2
Open issues: 1
Created: 2026-04-16T18:30:00Z
Pushed: 2026-06-10T14:27:20Z
Default branch: main
Fork: no
Archived: no
README:
Granite.Debug Tools
Granite.Debug is a suite of self-service debugging tools for Large Language Models (LLMs) that streamline issue detection, analysis, and resolution across diverse LLM workflows.
These tools help identify, evaluate, and resolve issues across fine-tuning workflows, benchmark analysis, and agent-based LLM interactions.
Available Tools
Selecting the Right Tool
| If I need to... | Then I should use... | | ---------------- | -------------------- | | Design scaffolded tasks to diagnose which skill-level capability is missing | [STaD](./STaD/) | | Benchmark LLM serving endpoints and local inference with an MCP-based tool | [perfbench](./perfbench/) | | Validate model behavior across inference engines (vLLM, llama.cpp, Ollama) | [runtimes-validator](./runtimes-validator/) |
STaD - Scaffolded Task Design
[STaD](./STaD/) is a framework for generating scaffolded variations of multi-step reasoning tasks to enable systematic LLM debugging, evaluation, and training.
Use STaD when you need to design scaffolded tasks to diagnose which skill-level capability is missing in your model.
perfbench - MCP server for Granite benchmarking
[perfbench](./perfbench/) is an MCP server that manages LLM benchmark runs as asynchronous subprocesses, wrapping five benchmark runners (vLLM, AIPerf, GuideLLM, llama-bench, Ollama) behind a unified tool interface.
Use perfbench when you need to benchmark LLM serving endpoints or local inference and want an agent-driven workflow via the Model Context Protocol.
runtimes-validator
[runtimes-validator](./runtimes-validator/) is a unified validation framework for running model checks across inference engines (vLLM, llama.cpp, Ollama). It provides a CLI (runtimes-validator) to run automated validation tests against Granite models deployed on different backends, supporting both managed (framework starts/stops the engine) and external (connect to a running engine) execution modes.
Use runtimes-validator when you need to validate that a Granite model behaves correctly across different inference engines.
Coming Soon
Additional debugging tools are being prepared for open-source release. Stay tuned!
Contributing
We welcome contributions! If you'd like to contribute to any of the tools in this repository, please open an issue or submit a pull request.
🚧 Work in Progress
This repository is actively evolving. We are continuously adding new debugging tools, expanding coverage, and refining existing functionality based on community feedback and ongoing research. Check back regularly for updates, and feel free to open an issue or discussion if you have suggestions or requests.
Notice
IBM Public Repository Disclosure: All content in this repository including code has been provided by IBM under the associated open source software license and IBM is under no obligation to provide enhancements, updates, or support. IBM developers produced this code as an open source project (not as an IBM product), and IBM makes no assertions as to the level of quality nor security, and will not be maintaining this code going forward.
Notability
notability 2.0/10Routine new repo with very low traction