togethercomputer/together-dgxc-benchmarking
forked from NVIDIA/dgxc-benchmarking
Captured source
source ↗togethercomputer/together-dgxc-benchmarking
Description: DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations.
License: NOASSERTION
Stars: 0
Forks: 0
Open issues: 0
Created: 2026-03-05T23:02:19Z
Pushed: 2026-05-29T22:33:46Z
Default branch: main
Fork: yes
Parent repository: NVIDIA/dgxc-benchmarking
Archived: no
README:
DGX Cloud Benchmarking - Performance Recipes
Performance Recipes are ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations. These containerized recipes allow users to quickly set up and run standardized benchmarking methodology in their own environment, ensuring consistent and comparable results across platforms.
These Performance Recipes support performance characterization
- across a variety of defined AI workloads, including pre-training, fine tuning, and inference.
- across GPU-based infrastructure, whether running on-premises or with cloud service providers (CSPs).
Each recipe maps to one workload and can be run at various cluster scales and precisions. These workloads are tested against the NVIDIA Reference Architecture and those results are provided as a baseline for comparison. These performance metrics are collected from production environments and are subject to real-world variability.
Prerequisites
To use the Performance Recipes, make sure you have the following prerequisites installed on your cluster:
General Prerequisites
- Bash 4.2 or newer
- Git LFS
- NGC Registry Access
- NGC CLI 3.148.1 or newer (Optional, required for NIM Inference workloads)
- Python 3.12.x
- CUDA: at least 12.3, recommended: 12.8 or newer
- NV Driver: at least 535.129.03, recommended 570.172.08 or newer
- OFED: 5.9-0.5.6.0.127 or newer
- NCCL: 2.19.4 or newer
Cluster-Specific Prerequisites
Depending on your cluster's job scheduler, ensure the following are met:
Quick Start Guide
> Important: Before proceeding with installation, please review the [Known Issues](#known-issues) section.
1. Clone the repository:
git clone https://github.com/NVIDIA/dgxc-benchmarking.git cd dgxc-benchmarking
2. Set up Hugging Face access (required): Most recipes fetch model metadata (for example: tokenizer and config) from the Hugging Face Hub during installation. Unauthenticated access is heavily rate limited and commonly causes installation failures.
- Create a Hugging Face account (if you don't have one)
- Create an access token in Hugging Face settings
- Keep the Hugging Face token handy. The installer will prompt for
HF_TOKEN(ifHF_TOKENis already set in your environment, the installer will use it as the default)
Gated model access (important): Some recipes use gated Hugging Face model repositories (for example: Llama). Even with HF_TOKEN, you must request repo access separately. Approvals are not instantaneous—request access early.
See [Model Access Requirements](#model-access-requirements) for the list of recipes that require additional approval.
3. (Optional) For NIM Inference workloads only:
- Generate an NGC API key from the NGC Registry
- Install and configure the NGC CLI:
x86
curl -L https://ngc.nvidia.com/downloads/ngccli_linux.zip -o ngccli_linux.zip unzip -q ngccli_linux.zip -d $HOME/.local/bin rm ngccli_linux.zip export PATH=$HOME/.local/bin:$PATH ngc config set
arm64
curl -L https://ngc.nvidia.com/downloads/ngccli_arm64.zip -o ngccli_arm64.zip unzip -q ngccli_arm64.zip -d $HOME/.local/bin rm ngccli_arm64.zip export PATH=$HOME/.local/bin/ngc-cli:$PATH ngc config set
4. Run the installer:
Important: Installation may take several hours, influenced by selected recipes, internet speed, and your current node's resources. Consider using a tool like tmux or screen.
This will set up a supported Python environment (reusing your current uv/venv/conda env if compatible, otherwise creating ../llmb_venv one directory above the repo), then launch the interactive installer.
./install.sh
The installer will:
- Ensure Python 3.12.x support via
uv, venv, or conda (creating a venv if needed) - Install the CLI tools (
llmb-run,llmb-install) - Prompt you to configure your cluster and select workloads to install
> Note: For detailed installation options, workload-specific virtual environments, and troubleshooting, see the [Installer README](cli/llmb-install/README.md).
5. Run a benchmark:
# Navigate to your installed workload directory cd $LLMB_INSTALL # Example: Run Llama 3.1 405B pretraining on 256 GPUs with FP8 precision llmb-run submit -w pretrain_llama3.1 -s 405b --dtype fp8 --scale 256
Shell Completion (Optional)
Enable tab completion for llmb-run commands and options:
llmb-run --install-completion
Restart your shell after installation for changes to take effect.
Directory Layout and Key Variables
After running the installer, the following directory structure is created:
LLMB_REPO: Directory containing the clone of the recipe repository.LLMB_INSTALL: Top-level directory for all benchmarking artifacts (images, datasets, venvs, workloads, etc).LLMB_WORKLOAD: Workload-specific directory, e.g.${LLMB_INSTALL}/workloads/pretrain_nemotron4.- Results, logs, and checkpoints are stored under subfolders of
LLMB_WORKLOAD(see below).
Example structure:
$LLMB_INSTALL/ ├── images/ ├── datasets/ ├── venvs/ └── workloads/ └── pretrain_nemotron4/ # <- $LLMB_WORKLOAD ├── NeMo/ ├── ... └── experiments/
LLMB_REPO, LLMB_INSTALL, and LLMB_WORKLOAD are shorthand terms for directory locations; LLMB_INSTALL is the only environment variable that needs to be set by the user.
Workload Resources Overview
Each workload resource includes:
- Configuration details: Comprehensive software and hardware…
Excerpt shown — open the source for the full document.
Notability
notability 2.0/10Routine fork of benchmarking repo