What does this model signal mean?

NVIDIA published nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license other · 750 HF downloads · Large model release from NVIDIA with moderate traction. onlylabs links this event to 1 captured evidence page and 6 related model signals.

NVIDIA Model: nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16

Captured source

source ↗

Hugging Face/huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16 model card

Source ↗

published Jun 3, 2026seen Jun 6captured Jun 11http 200method plaintask text-generationlicense otherlibrary transformersparams 561Bdownloads 750likes 29

NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16

Model Overview

Model Developer: NVIDIA Corporation

Model Dates:

December 2025 - April 2026

Data Freshness:

The pre-training data has a cutoff date of September 2025.

Description

NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16 is a large language model (LLM) trained by NVIDIA.

The model employs a hybrid Latent Mixture-of-Experts (LatentMoE) architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. Distinct from the Nano model, the Ultra model incorporates Multi-Token Prediction (MTP) layers for faster text generation and improved quality, and it is pre-trained using an NVFP4 recipe to maximize compute efficiency. The model has 55B active parameters and 550B parameters in total.

The supported languages include: English, French, Spanish, Italian, German, Japanese, Hindi, Korean, Brazilian Portuguese, and Chinese.

This model is ready for commercial and non-commercial use.

What is Nemotron?

NVIDIA Nemotron™ is a family of open models with open weights, training data, and recipes, delivering leading efficiency and accuracy for building specialized AI agents.

License/Terms of Use

Use of this model is governed by the OpenMDW License Agreement, version 1.1 (OpenMDW-1.1).

Benchmarks

| Task | Metric | Nemotron-3-Ultra 550B-A55B-Base | DeepSeek-V3.2 Exp-Base | Mistral-Large-3 675B-Base-2512 | Kimi-K2 Base | GLM-4.5 Base | | :--- | :--- | :---: | :---: | :---: | :---: | :---: | | General Knowledge | | | | | | | | MMLU | *5-shot, acc* | 89.08 | 87.82 | 87.35 | 87.60 | 86.50 | | MMLU-Pro | *5-shot, CoT EM* | 79.07 | 63.26 | 67.42 | 69.15 | 65.78 | | AGIEval-En | *3/5-shot, CoT EM* | 78.73 | 70.13 | 69.30 | 72.55 | 70.06 | | GPQA | *5-shot, CoT EM* | 50.00 | 31.82 | 34.85 | 43.43 | 34.85 | | Math | | | | | | | | GSM8K | *8-shot, CoT EM* | 88.10 | 84.38 | 91.21 | 91.05 | 85.37 | | MATH | *4-shot, EM* | 82.00 | 60.12 | 62.88 | 68.40 | 57.58 | | Code | | | | | | | | HumanEval | *sampled pass@1 n=32, EvalPlus sanitized* | 83.84 | 61.85 | 66.71 | 78.20 | 78.16 | | MBPP-Sanitized | *3-shot pass@1 n=32, EvalPlus sanitized* | 85.97 | 58.66 | 84.08 | 72.14 | 76.69 | | Commonsense Understanding | | | | | | | | ARC-Challenge | *25-shot, acc_norm* | 97.35 | 95.22 | 97.27 | 95.82 | 96.59 | | HellaSwag | *10-shot, acc_norm* | 90.51 | 89.44 | 88.88 | 90.92 | 90.17 | | OpenBookQA | *0-shot, acc_norm* | 48.60 | 48.20 | 51.40 | 50.80 | 49.60 | | PIQA | *0-shot, acc_norm* | 83.79 | 85.09 | 84.82 | 85.47 | 85.09 | | WinoGrande | *5-shot, acc* | 79.32 | 83.43 | 82.08 | 84.21 | 85.24 | | Reading Comprehension | | | | | | | | RACE | *0-shot, acc* | 92.15 | 93.21 | 93.30 | 91.96 | 92.15 | | Multilingual | | | | | | | | MMLU Global Lite | *5-shot, avg* | 90.13 | 85.59 | 87.34 | 85.63 | 85.81 | | MGSM | *8-shot, native CoT avg* | 87.73 | 82.33 | 82.93 | 85.20 | 81.27 | | Long Context | | | | | | | | RULER 64K | *0-shot* | 95.30 | 93.30 | 90.11 | 93.79 | 16.12 | | RULER 128K | *0-shot* | 92.49 | 91.88 | 55.77 | 88.61 | 0.00 | | RULER 256K | *0-shot* | 86.22 | -- | 35.50 | -- | -- | | RULER 512K | *0-shot* | 84.54 | -- | -- | -- | -- | | RULER 1M | *0-shot* | 76.83 | -- | -- | -- | -- |

Comparison of Nemotron-3-Ultra-550B-A55B-Base, [DeepSeek-V3.2-Exp-Base](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp-Base), [Mistral-Large-3-675B-Base-2512](https://huggingface.co/mistralai/Mistral-Large-3-675B-Base-2512), [Kimi-K2-Base](https://huggingface.co/moonshotai/Kimi-K2-Base), and [GLM-4.5-Base](https://huggingface.co/zai-org/GLM-4.5-Base). Best available results are marked in bold.

All evaluation results were collected via Nemo Evaluator SDK and NVIDIA's open source container of LM Evaluation Harness, unless otherwise stated. For reproducibility purposes, more details on the evaluation settings can be found in the Nemo Evaluator SDK examples folder and the reproducibility tutorial for Nemotron 3 Ultra. The open source container on LM Evaluation Harness packaged via NVIDIA's Nemo Evaluator SDK used for evaluations can be found here.

Deployment Geography: Global

Use Case

This model is intended for developers and researchers building LLMs.

Release Date

Hugging Face - 06/04/2026 via Hugging Face

Reference(s)

Model Architecture

Architecture Type: Mamba2-Transformer Hybrid Latent Mixture of Experts (LatentMoE) with Multi-Token Prediction (MTP)
Network Architecture: Nemotron Hybrid LatentMoE
Number of model parameters: 550B Total / 55B Active

Model Design

The model was pre-trained with around 20T tokens and supports up to 1M context length. The pre-training phase used an NVFP4 recipe. It utilizes the LatentMoE architecture, where tokens are projected into a smaller latent dimension for expert routing and computation, improving accuracy per byte. The model includes Multi-Token Prediction (MTP) layers, which predict multiple future tokens to provide richer training signals and enable faster inference via speculative decoding.

Training Methodology

Stage 1: Pre-Training

NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16 model was pre-trained using an NVFP4 recipe with crawled and synthetic code, math, science, and general knowledge data.

Software used for pre-training: Megatron-LM

NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16 model is a result of the above work.

Input

Input Type(s): Text
Input Format(s): String
**Input...

Excerpt shown — open the source for the full document.

Notability

notability 8.0/10

Large model release from NVIDIA with moderate traction