ForkTogether AITogether AIpublished Mar 5, 2026seen 5d

togethercomputer/together-dgxc-benchmarking

forked from NVIDIA/dgxc-benchmarking

Open original ↗

Captured source

source ↗

togethercomputer/together-dgxc-benchmarking

Description: DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations.

License: NOASSERTION

Stars: 0

Forks: 0

Open issues: 0

Created: 2026-03-05T23:02:19Z

Pushed: 2026-05-29T22:33:46Z

Default branch: main

Fork: yes

Parent repository: NVIDIA/dgxc-benchmarking

Archived: no

README:

DGX Cloud Benchmarking - Performance Recipes

Performance Recipes are ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations. These containerized recipes allow users to quickly set up and run standardized benchmarking methodology in their own environment, ensuring consistent and comparable results across platforms.

These Performance Recipes support performance characterization

  • across a variety of defined AI workloads, including pre-training, fine tuning, and inference.
  • across GPU-based infrastructure, whether running on-premises or with cloud service providers (CSPs).

Each recipe maps to one workload and can be run at various cluster scales and precisions. These workloads are tested against the NVIDIA Reference Architecture and those results are provided as a baseline for comparison. These performance metrics are collected from production environments and are subject to real-world variability.

Prerequisites

To use the Performance Recipes, make sure you have the following prerequisites installed on your cluster:

General Prerequisites

  • Bash 4.2 or newer
  • Git LFS
  • NGC Registry Access
  • NGC CLI 3.148.1 or newer (Optional, required for NIM Inference workloads)
  • Python 3.12.x
  • CUDA: at least 12.3, recommended: 12.8 or newer
  • NV Driver: at least 535.129.03, recommended 570.172.08 or newer
  • OFED: 5.9-0.5.6.0.127 or newer
  • NCCL: 2.19.4 or newer

Cluster-Specific Prerequisites

Depending on your cluster's job scheduler, ensure the following are met:

  • Slurm Clusters
  • Version 22.x or newer
  • task/affinity plugin required for process pinning
  • Enroot
  • Pyxis

Quick Start Guide

> Important: Before proceeding with installation, please review the [Known Issues](#known-issues) section.

1. Clone the repository:

git clone https://github.com/NVIDIA/dgxc-benchmarking.git
cd dgxc-benchmarking

2. Set up Hugging Face access (required): Most recipes fetch model metadata (for example: tokenizer and config) from the Hugging Face Hub during installation. Unauthenticated access is heavily rate limited and commonly causes installation failures.

  • Create a Hugging Face account (if you don't have one)
  • Create an access token in Hugging Face settings
  • Keep the Hugging Face token handy. The installer will prompt for HF_TOKEN (if HF_TOKEN is already set in your environment, the installer will use it as the default)

Gated model access (important): Some recipes use gated Hugging Face model repositories (for example: Llama). Even with HF_TOKEN, you must request repo access separately. Approvals are not instantaneous—request access early.

See [Model Access Requirements](#model-access-requirements) for the list of recipes that require additional approval.

3. (Optional) For NIM Inference workloads only:

  • Generate an NGC API key from the NGC Registry
  • Install and configure the NGC CLI:

x86

curl -L https://ngc.nvidia.com/downloads/ngccli_linux.zip -o ngccli_linux.zip
unzip -q ngccli_linux.zip -d $HOME/.local/bin
rm ngccli_linux.zip
export PATH=$HOME/.local/bin:$PATH
ngc config set

arm64

curl -L https://ngc.nvidia.com/downloads/ngccli_arm64.zip -o ngccli_arm64.zip
unzip -q ngccli_arm64.zip -d $HOME/.local/bin
rm ngccli_arm64.zip
export PATH=$HOME/.local/bin/ngc-cli:$PATH
ngc config set

4. Run the installer:

Important: Installation may take several hours, influenced by selected recipes, internet speed, and your current node's resources. Consider using a tool like tmux or screen.

This will set up a supported Python environment (reusing your current uv/venv/conda env if compatible, otherwise creating ../llmb_venv one directory above the repo), then launch the interactive installer.

./install.sh

The installer will:

  • Ensure Python 3.12.x support via uv, venv, or conda (creating a venv if needed)
  • Install the CLI tools (llmb-run, llmb-install)
  • Prompt you to configure your cluster and select workloads to install

> Note: For detailed installation options, workload-specific virtual environments, and troubleshooting, see the [Installer README](cli/llmb-install/README.md).

5. Run a benchmark:

# Navigate to your installed workload directory
cd $LLMB_INSTALL

# Example: Run Llama 3.1 405B pretraining on 256 GPUs with FP8 precision
llmb-run submit -w pretrain_llama3.1 -s 405b --dtype fp8 --scale 256

Shell Completion (Optional)

Enable tab completion for llmb-run commands and options:

llmb-run --install-completion

Restart your shell after installation for changes to take effect.

Directory Layout and Key Variables

After running the installer, the following directory structure is created:

  • LLMB_REPO: Directory containing the clone of the recipe repository.
  • LLMB_INSTALL: Top-level directory for all benchmarking artifacts (images, datasets, venvs, workloads, etc).
  • LLMB_WORKLOAD: Workload-specific directory, e.g. ${LLMB_INSTALL}/workloads/pretrain_nemotron4.
  • Results, logs, and checkpoints are stored under subfolders of LLMB_WORKLOAD (see below).

Example structure:

$LLMB_INSTALL/
├── images/
├── datasets/
├── venvs/
└── workloads/
└── pretrain_nemotron4/ # <- $LLMB_WORKLOAD
├── NeMo/
├── ...
└── experiments/

LLMB_REPO, LLMB_INSTALL, and LLMB_WORKLOAD are shorthand terms for directory locations; LLMB_INSTALL is the only environment variable that needs to be set by the user.

Workload Resources Overview

Each workload resource includes:

  • Configuration details: Comprehensive software and hardware…

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Routine fork of benchmarking repo