What does this fork signal mean?

Together AI forked togethercomputer/together-dgxc-benchmarking (forked from NVIDIA/dgxc-benchmarking). This fork signal points to upstream code the lab may be inspecting, patching, or building on. High-signal details: repo togethercomputer/together-dgxc-benchmarking · parent NVIDIA/dgxc-benchmarking · Routine fork of benchmarking repo. onlylabs links this event to 1 captured evidence page and 6 related fork signals.

Together AI Fork: togethercomputer/together-dgxc-benchmarking

Captured source

source ↗

GitHub/github.com/togethercomputer/together-dgxc-benchmarking

togethercomputer/together-dgxc-benchmarking repository metadata

Source ↗

published Mar 5, 2026seen Jun 5captured Jun 11http 200method plain

togethercomputer/together-dgxc-benchmarking

Description: DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations.

License: NOASSERTION

Stars: 0

Forks: 0

Open issues: 0

Created: 2026-03-05T23:02:19Z

Pushed: 2026-05-29T22:33:46Z

Default branch: main

Fork: yes

Parent repository: NVIDIA/dgxc-benchmarking

Archived: no

README:

DGX Cloud Benchmarking - Performance Recipes

Performance Recipes are ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations. These containerized recipes allow users to quickly set up and run standardized benchmarking methodology in their own environment, ensuring consistent and comparable results across platforms.

These Performance Recipes support performance characterization

across a variety of defined AI workloads, including pre-training, fine tuning, and inference.
across GPU-based infrastructure, whether running on-premises or with cloud service providers (CSPs).

Each recipe maps to one workload and can be run at various cluster scales and precisions. These workloads are tested against the NVIDIA Reference Architecture and those results are provided as a baseline for comparison. These performance metrics are collected from production environments and are subject to real-world variability.

Prerequisites

To use the Performance Recipes, make sure you have the following prerequisites installed on your cluster:

General Prerequisites

Bash 4.2 or newer
Git LFS
NGC Registry Access
NGC CLI 3.148.1 or newer (Optional, required for NIM Inference workloads)
Python 3.12.x
CUDA: at least 12.3, recommended: 12.8 or newer
NV Driver: at least 535.129.03, recommended 570.172.08 or newer
OFED: 5.9-0.5.6.0.127 or newer
NCCL: 2.19.4 or newer

Cluster-Specific Prerequisites

Depending on your cluster's job scheduler, ensure the following are met:

Slurm Clusters
Version 22.x or newer
task/affinity plugin required for process pinning
Enroot
Pyxis

Quick Start Guide

> Important: Before proceeding with installation, please review the [Known Issues](#known-issues) section.

1. Clone the repository:

git clone https://github.com/NVIDIA/dgxc-benchmarking.git
cd dgxc-benchmarking

2. Set up Hugging Face access (required): Most recipes fetch model metadata (for example: tokenizer and config) from the Hugging Face Hub during installation. Unauthenticated access is heavily rate limited and commonly causes installation failures.

Create a Hugging Face account (if you don't have one)
Create an access token in Hugging Face settings
Keep the Hugging Face token handy. The installer will prompt for HF_TOKEN (if HF_TOKEN is already set in your environment, the installer will use it as the default)

Gated model access (important): Some recipes use gated Hugging Face model repositories (for example: Llama). Even with HF_TOKEN, you must request repo access separately. Approvals are not instantaneous—request access early.

See [Model Access Requirements](#model-access-requirements) for the list of recipes that require additional approval.

3. (Optional) For NIM Inference workloads only:

Generate an NGC API key from the NGC Registry
Install and configure the NGC CLI:

x86

curl -L https://ngc.nvidia.com/downloads/ngccli_linux.zip -o ngccli_linux.zip
unzip -q ngccli_linux.zip -d $HOME/.local/bin
rm ngccli_linux.zip
export PATH=$HOME/.local/bin:$PATH
ngc config set

arm64

curl -L https://ngc.nvidia.com/downloads/ngccli_arm64.zip -o ngccli_arm64.zip
unzip -q ngccli_arm64.zip -d $HOME/.local/bin
rm ngccli_arm64.zip
export PATH=$HOME/.local/bin/ngc-cli:$PATH
ngc config set

4. Run the installer:

Important: Installation may take several hours, influenced by selected recipes, internet speed, and your current node's resources. Consider using a tool like tmux or screen.

This will set up a supported Python environment (reusing your current uv/venv/conda env if compatible, otherwise creating ../llmb_venv one directory above the repo), then launch the interactive installer.

./install.sh

The installer will:

Ensure Python 3.12.x support via uv, venv, or conda (creating a venv if needed)
Install the CLI tools (llmb-run, llmb-install)
Prompt you to configure your cluster and select workloads to install

> Note: For detailed installation options, workload-specific virtual environments, and troubleshooting, see the [Installer README](cli/llmb-install/README.md).

5. Run a benchmark:

# Navigate to your installed workload directory
cd $LLMB_INSTALL

# Example: Run Llama 3.1 405B pretraining on 256 GPUs with FP8 precision
llmb-run submit -w pretrain_llama3.1 -s 405b --dtype fp8 --scale 256

Shell Completion (Optional)

Enable tab completion for llmb-run commands and options:

llmb-run --install-completion

Restart your shell after installation for changes to take effect.

Directory Layout and Key Variables

After running the installer, the following directory structure is created:

LLMB_REPO: Directory containing the clone of the recipe repository.
LLMB_INSTALL: Top-level directory for all benchmarking artifacts (images, datasets, venvs, workloads, etc).
LLMB_WORKLOAD: Workload-specific directory, e.g. ${LLMB_INSTALL}/workloads/pretrain_nemotron4.
Results, logs, and checkpoints are stored under subfolders of LLMB_WORKLOAD (see below).

Example structure:

$LLMB_INSTALL/
├── images/
├── datasets/
├── venvs/
└── workloads/
└── pretrain_nemotron4/ # <- $LLMB_WORKLOAD
├── NeMo/
├── ...
└── experiments/

LLMB_REPO, LLMB_INSTALL, and LLMB_WORKLOAD are shorthand terms for directory locations; LLMB_INSTALL is the only environment variable that needs to be set by the user.

Workload Resources Overview

Each workload resource includes:

Configuration details: Comprehensive software and hardware...

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Routine fork of benchmarking repo