RepoNVIDIANVIDIApublished Mar 3, 2026seen 5d

NVIDIA/Ising-Decoding

Jupyter Notebook

Open original ↗

Captured source

source ↗
published Mar 3, 2026seen 5dcaptured 11hhttp 200method plain

NVIDIA/Ising-Decoding

Description: A set of training recipes for AI Quantum Error Correction Decoders

Language: Jupyter Notebook

License: Apache-2.0

Stars: 107

Forks: 39

Open issues: 0

Created: 2026-03-03T01:07:10Z

Pushed: 2026-06-01T14:49:40Z

Default branch: main

Fork: no

Archived: no

README:

Ising Decoding

This repo offers AI training recipes to build, customize and deploy scalable quantum error correction decoders:

  • A neural network consumes detector syndromes across space and time
  • It predicts corrections that reduce syndrome density / improve decoding
  • A standard decoder (PyMatching) produces the final logical decision

The public release exposes a single user-facing config and a single runner script.

![Pre-decoder pipeline](images/predecoder_pipeline.png)

Table of Contents

  • [Publication](#publication)
  • [High-level workflow](#high-level-workflow)
  • [Quick start (train + inference)](#quick-start-train--inference)
  • [Dependencies](#dependencies)
  • [Troubleshooting](#troubleshooting)
  • [Inference (pre-trained models)](#inference-pre-trained-models)
  • [Model export and downstream tools](#model-export-and-downstream-tools)
  • [Converting .pt checkpoints to SafeTensors](#converting-pt-checkpoints-to-safetensors-optional-post-training)
  • [ONNX export and quantization](#onnx-export-and-quantization-optional-post-training)
  • [Generating data for CUDA-Q QEC](#generating-data-for-cuda-q-qec-realtime-predecoder-test-application)
  • [Offline decoding from Stim detector samples](#offline-decoding-from-stim-detector-samples)
  • [Decoder ablation study with cudaq-qec](#decoder-ablation-study-with-cudaq-qec-optional)
  • [Configuration and advanced usage](#configuration-and-advanced-usage)
  • [GPU selection](#gpu-selection)
  • [Public configuration](#public-configuration-confconfig_publicyaml)
  • [Precomputed frames](#precomputed-frames-recommended)
  • [Resuming training and running inference](#resuming-training-and-running-inference-on-a-trained-model)
  • [Logging and outputs](#logging-and-outputs)
  • [What gets written where](#what-gets-written-where)
  • [Evaluation defaults](#evaluation-defaults-public-release)
  • [Testing and CI](#testing-and-ci)
  • [Testing (CPU + GPU)](#testing-cpu--gpu)
  • [CI (GitHub Actions)](#ci-github-actions)
  • [Results](#results)
  • [License](#license)

Publication

This implementation accompanies the paper:

Christopher Chamberland, Jan Olle, Muyuan Li, Scott Thornton, and Igor Baratta, "Fast and accurate AI-based pre-decoders for surface codes," arXiv:2604.12841, 2026. doi:10.48550/arXiv.2604.12841

Please cite the paper if you use this repository in research or published work.

High-level workflow

┌────────────────────────────────────────┐ Uses:
│ 1. Train or Download Model │ - Ising-Decoding repo (train)
│ │ - Hugging Face (download)
└──────────────────┬─────────────────────┘
│
▼
┌────────────────────────────────────────┐ Uses:
│ 2. Assess Performance │ - Ising-Decoding repo
│ (Run inference tests) │
└──────────────────┬─────────────────────┘
│
┌──────────────────▼─────────────────────┐ Uses:
│ 3. Investigate Realtime Performance │ - Ising-Decoding repo (3a, 3b)
│ │ - CUDA-Q QEC (3c)
│ ┌────────────────────────────────┐ │
│ │ 3a. Enable ONNX_WORKFLOW & │ │
│ │ choose quantization format │ │
│ └──────────────┬─────────────────┘ │
│ │ │
│ ┌──────────────▼─────────────────┐ │
│ │ 3b. Run generate_test_data.py │ │
│ └──────────────┬─────────────────┘ │
│ │ │
│ ┌──────────────▼─────────────────┐ │
│ │ 3c. Take .onnx and .bin files │ │
│ │ into CUDA-Q QEC │ │
│ └────────────────────────────────┘ │
└────────────────────────────────────────┘

Quick start (train + inference)

From the repo root:

  • code/scripts/local_run.sh

This script runs the Hydra workflow locally (no SLURM required) and reads one user-facing config file:

  • conf/config_public.yaml

Dependencies

Target Python versions: 3.11, 3.12, 3.13.

Two minimal requirements files are provided:

  • code/requirements_public_inference.txt (Stim + PyTorch path)
  • code/requirements_public_train-cuXY.txt (training path, where XY = 12 or 13)

Install examples (virtual environment is optional but recommended):

# Optional: create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# Optional: install CUDA-enabled PyTorch (example: pick any available cuXXX)
# Pick one that matches your CUDA runtime; cu130 is known to work.
export TORCH_CUDA=cu130

# Inference-only (training install is a superset)
pip install -r code/requirements_public_inference.txt

# Training (includes inference deps, adjust to cu13 as appropriate)
pip install -r code/requirements_public_train-cu12.txt

bash code/scripts/check_python_compat.sh

Tip: To force CUDA-enabled PyTorch, set TORCH_CUDA=cuXXX (recommended cu13x) or TORCH_WHL_INDEX=https://download.pytorch.org/whl/cuXXX before running installs.

Quick start:

# Train (reads conf/config_public.yaml)
bash code/scripts/local_run.sh

# Inference (loads a saved model from outputs//models/*)
WORKFLOW=inference bash code/scripts/local_run.sh

Inference note:

  • On bare metal, keep the default DataLoader workers.
  • In containers, set a larger shared-memory size (e.g., docker run --shm-size=1g ...).
  • If you cannot change --shm-size, set PREDECODER_INFERENCE_NUM_WORKERS=0 to avoid shared-memory worker crashes.
  • Default evaluation is heavy (cfg.test.num_samples=262144 shots per basis); expect inference to take time.

Troubleshooting

  • Avoid `steps_per_epoch=0` on short runs:
  • Keep PREDECODER_TRAIN_SAMPLES >= per_device_batch_size * accumulate_steps * world_size.
  • Note: the batch schedule jumps to 2048 after epoch 0, so epoch 1 uses

2048 * 2 * world_size effective batch size.

  • For quick short runs, use GPUS=1 and PREDECODER_TRAIN_SAMPLES >= 4096.
  • Segfaults during training startup (torch.compile):
  • Some environments crash during torch.compile.
  • Disable compile: TORCH_COMPILE=0 bash code/scripts/local_run.sh.
  • Or try a safer mode: TORCH_COMPILE=1 TORCH_COMPILE_MODE=reduce-overhead bash code/scripts/local_run.sh.
  • Blackwell GPUs (RTX 5080/5090, GB200/GB300):
  • Stable PyTorch wheels (cu124) do not ship SM 12.0 kernels.

Install the nightly build with the cu128 index:

pip install --pre torch --index-url…

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

New repo, moderate stars, not notable.