What does this repo signal mean?

NVIDIA published NVIDIA/Ising-Decoding (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo NVIDIA/Ising-Decoding · language Python · New repo, moderate stars, not notable.. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Infrastructure in the data-business radar.

NVIDIA Repo: NVIDIA/Ising-Decoding

Captured source

source ↗

GitHub/github.com/NVIDIA/Ising-Decoding

NVIDIA/Ising-Decoding repository metadata

Source ↗

published Mar 3, 2026seen Jun 5captured Jun 11http 200method plain

NVIDIA/Ising-Decoding

Description: A set of training recipes for AI Quantum Error Correction Decoders

Language: Jupyter Notebook

License: Apache-2.0

Stars: 107

Forks: 39

Open issues: 0

Created: 2026-03-03T01:07:10Z

Pushed: 2026-06-01T14:49:40Z

Default branch: main

Fork: no

Archived: no

README:

Ising Decoding

This repo offers AI training recipes to build, customize and deploy scalable quantum error correction decoders:

A neural network consumes detector syndromes across space and time
It predicts corrections that reduce syndrome density / improve decoding
A standard decoder (PyMatching) produces the final logical decision

The public release exposes a single user-facing config and a single runner script.

![Pre-decoder pipeline](images/predecoder_pipeline.png)

[Publication](#publication)
[High-level workflow](#high-level-workflow)
[Quick start (train + inference)](#quick-start-train--inference)
[Dependencies](#dependencies)
[Troubleshooting](#troubleshooting)
[Inference (pre-trained models)](#inference-pre-trained-models)
[Model export and downstream tools](#model-export-and-downstream-tools)
[Converting .pt checkpoints to SafeTensors](#converting-pt-checkpoints-to-safetensors-optional-post-training)
[ONNX export and quantization](#onnx-export-and-quantization-optional-post-training)
[Generating data for CUDA-Q QEC](#generating-data-for-cuda-q-qec-realtime-predecoder-test-application)
[Offline decoding from Stim detector samples](#offline-decoding-from-stim-detector-samples)
[Decoder ablation study with cudaq-qec](#decoder-ablation-study-with-cudaq-qec-optional)
[Configuration and advanced usage](#configuration-and-advanced-usage)
[GPU selection](#gpu-selection)
[Public configuration](#public-configuration-confconfig_publicyaml)
[Precomputed frames](#precomputed-frames-recommended)
[Resuming training and running inference](#resuming-training-and-running-inference-on-a-trained-model)
[Logging and outputs](#logging-and-outputs)
[What gets written where](#what-gets-written-where)
[Evaluation defaults](#evaluation-defaults-public-release)
[Testing and CI](#testing-and-ci)
[Testing (CPU + GPU)](#testing-cpu--gpu)
[CI (GitHub Actions)](#ci-github-actions)
[Results](#results)
[License](#license)

Publication

This implementation accompanies the paper:

Christopher Chamberland, Jan Olle, Muyuan Li, Scott Thornton, and Igor Baratta, "Fast and accurate AI-based pre-decoders for surface codes," arXiv:2604.12841, 2026. doi:10.48550/arXiv.2604.12841

Please cite the paper if you use this repository in research or published work.

High-level workflow

┌────────────────────────────────────────┐ Uses:
│ 1. Train or Download Model │ - Ising-Decoding repo (train)
│ │ - Hugging Face (download)
└──────────────────┬─────────────────────┘
│
▼
┌────────────────────────────────────────┐ Uses:
│ 2. Assess Performance │ - Ising-Decoding repo
│ (Run inference tests) │
└──────────────────┬─────────────────────┘
│
┌──────────────────▼─────────────────────┐ Uses:
│ 3. Investigate Realtime Performance │ - Ising-Decoding repo (3a, 3b)
│ │ - CUDA-Q QEC (3c)
│ ┌────────────────────────────────┐ │
│ │ 3a. Enable ONNX_WORKFLOW & │ │
│ │ choose quantization format │ │
│ └──────────────┬─────────────────┘ │
│ │ │
│ ┌──────────────▼─────────────────┐ │
│ │ 3b. Run generate_test_data.py │ │
│ └──────────────┬─────────────────┘ │
│ │ │
│ ┌──────────────▼─────────────────┐ │
│ │ 3c. Take .onnx and .bin files │ │
│ │ into CUDA-Q QEC │ │
│ └────────────────────────────────┘ │
└────────────────────────────────────────┘

Quick start (train + inference)

From the repo root:

code/scripts/local_run.sh

This script runs the Hydra workflow locally (no SLURM required) and reads one user-facing config file:

conf/config_public.yaml

Dependencies

Target Python versions: 3.11, 3.12, 3.13.

Two minimal requirements files are provided:

code/requirements_public_inference.txt (Stim + PyTorch path)
code/requirements_public_train-cuXY.txt (training path, where XY = 12 or 13)

Install examples (virtual environment is optional but recommended):

# Optional: create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate

# Optional: install CUDA-enabled PyTorch (example: pick any available cuXXX)
# Pick one that matches your CUDA runtime; cu130 is known to work.
export TORCH_CUDA=cu130

# Inference-only (training install is a superset)
pip install -r code/requirements_public_inference.txt

# Training (includes inference deps, adjust to cu13 as appropriate)
pip install -r code/requirements_public_train-cu12.txt

bash code/scripts/check_python_compat.sh

Tip: To force CUDA-enabled PyTorch, set TORCH_CUDA=cuXXX (recommended cu13x) or TORCH_WHL_INDEX=https://download.pytorch.org/whl/cuXXX before running installs.

Quick start:

# Train (reads conf/config_public.yaml)
bash code/scripts/local_run.sh

# Inference (loads a saved model from outputs//models/*)
WORKFLOW=inference bash code/scripts/local_run.sh

Inference note:

On bare metal, keep the default DataLoader workers.
In containers, set a larger shared-memory size (e.g., docker run --shm-size=1g ...).
If you cannot change --shm-size, set PREDECODER_INFERENCE_NUM_WORKERS=0 to avoid shared-memory worker crashes.
Default evaluation is heavy (cfg.test.num_samples=262144 shots per basis); expect inference to take time.

Troubleshooting

Avoid `steps_per_epoch=0` on short runs:
Keep PREDECODER_TRAIN_SAMPLES >= per_device_batch_size * accumulate_steps * world_size.
Note: the batch schedule jumps to 2048 after epoch 0, so epoch 1 uses

2048 * 2 * world_size effective batch size.

For quick short runs, use GPUS=1 and PREDECODER_TRAIN_SAMPLES >= 4096.
Segfaults during training startup (torch.compile):
Some environments crash during torch.compile.
Disable compile: TORCH_COMPILE=0 bash code/scripts/local_run.sh.
Or try a safer mode: TORCH_COMPILE=1 TORCH_COMPILE_MODE=reduce-overhead bash code/scripts/local_run.sh.
Blackwell GPUs (RTX 5080/5090, GB200/GB300):
Stable PyTorch wheels (cu124) do not ship SM 12.0 kernels.

Install the nightly build with the cu128 index:

pip install --pre torch --index-url...

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

New repo, moderate stars, not notable.