ModelNVIDIANVIDIApublished Jun 3, 2026seen 5d

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16

Open original ↗

Captured source

source ↗
published Jun 3, 2026seen 5dcaptured 11hhttp 200method plaintask text-generationlicense otherlibrary transformersparams 561Bdownloads 1.6klikes 25

NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16

Model Overview

Model Developer: NVIDIA Corporation

Model Dates:

December 2025 - April 2026

Data Freshness:

  • The pre-training data has a cutoff date of September 2025.

Description

NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16 is a large language model (LLM) trained by NVIDIA.

The model employs a hybrid Latent Mixture-of-Experts (LatentMoE) architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. Distinct from the Nano model, the Ultra model incorporates Multi-Token Prediction (MTP) layers for faster text generation and improved quality, and it is pre-trained using an NVFP4 recipe to maximize compute efficiency. The model has 55B active parameters and 550B parameters in total.

The supported languages include: English, French, Spanish, Italian, German, Japanese, Hindi, Korean, Brazilian Portuguese, and Chinese.

This model is ready for commercial and non-commercial use.

What is Nemotron?

NVIDIA Nemotron™ is a family of open models with open weights, training data, and recipes, delivering leading efficiency and accuracy for building specialized AI agents.

License/Terms of Use

Use of this model is governed by the OpenMDW License Agreement, version 1.1 (OpenMDW-1.1).

Benchmarks

| Task | Metric | Nemotron-3-Ultra 550B-A55B-Base | DeepSeek-V3.2 Exp-Base | Mistral-Large-3 675B-Base-2512 | Kimi-K2 Base | GLM-4.5 Base | | :--- | :--- | :---: | :---: | :---: | :---: | :---: | | General Knowledge | | | | | | | | MMLU | *5-shot, acc* | 89.08 | 87.82 | 87.35 | 87.60 | 86.50 | | MMLU-Pro | *5-shot, CoT EM* | 79.07 | 63.26 | 67.42 | 69.15 | 65.78 | | AGIEval-En | *3/5-shot, CoT EM* | 78.73 | 70.13 | 69.30 | 72.55 | 70.06 | | GPQA | *5-shot, CoT EM* | 50.00 | 31.82 | 34.85 | 43.43 | 34.85 | | Math | | | | | | | | GSM8K | *8-shot, CoT EM* | 88.10 | 84.38 | 91.21 | 91.05 | 85.37 | | MATH | *4-shot, EM* | 82.00 | 60.12 | 62.88 | 68.40 | 57.58 | | Code | | | | | | | | HumanEval | *sampled pass@1 n=32, EvalPlus sanitized* | 83.84 | 61.85 | 66.71 | 78.20 | 78.16 | | MBPP-Sanitized | *3-shot pass@1 n=32, EvalPlus sanitized* | 85.97 | 58.66 | 84.08 | 72.14 | 76.69 | | Commonsense Understanding | | | | | | | | ARC-Challenge | *25-shot, acc_norm* | 97.35 | 95.22 | 97.27 | 95.82 | 96.59 | | HellaSwag | *10-shot, acc_norm* | 90.51 | 89.44 | 88.88 | 90.92 | 90.17 | | OpenBookQA | *0-shot, acc_norm* | 48.60 | 48.20 | 51.40 | 50.80 | 49.60 | | PIQA | *0-shot, acc_norm* | 83.79 | 85.09 | 84.82 | 85.47 | 85.09 | | WinoGrande | *5-shot, acc* | 79.32 | 83.43 | 82.08 | 84.21 | 85.24 | | Reading Comprehension | | | | | | | | RACE | *0-shot, acc* | 92.15 | 93.21 | 93.30 | 91.96 | 92.15 | | Multilingual | | | | | | | | MMLU Global Lite | *5-shot, avg* | 90.13 | 85.59 | 87.34 | 85.63 | 85.81 | | MGSM | *8-shot, native CoT avg* | 87.73 | 82.33 | 82.93 | 85.20 | 81.27 | | Long Context | | | | | | | | RULER 64K | *0-shot* | 95.30 | 93.30 | 90.11 | 93.79 | 16.12 | | RULER 128K | *0-shot* | 92.49 | 91.88 | 55.77 | 88.61 | 0.00 | | RULER 256K | *0-shot* | 86.22 | -- | 35.50 | -- | -- | | RULER 512K | *0-shot* | 84.54 | -- | -- | -- | -- | | RULER 1M | *0-shot* | 76.83 | -- | -- | -- | -- |

Comparison of Nemotron-3-Ultra-550B-A55B-Base, [DeepSeek-V3.2-Exp-Base](https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp-Base), [Mistral-Large-3-675B-Base-2512](https://huggingface.co/mistralai/Mistral-Large-3-675B-Base-2512), [Kimi-K2-Base](https://huggingface.co/moonshotai/Kimi-K2-Base), and [GLM-4.5-Base](https://huggingface.co/zai-org/GLM-4.5-Base). Best available results are marked in bold.

All evaluation results were collected via Nemo Evaluator SDK and NVIDIA's open source container of LM Evaluation Harness, unless otherwise stated. For reproducibility purposes, more details on the evaluation settings can be found in the Nemo Evaluator SDK examples folder and the reproducibility tutorial for Nemotron 3 Ultra. The open source container on LM Evaluation Harness packaged via NVIDIA's Nemo Evaluator SDK used for evaluations can be found here.

Deployment Geography: Global

Use Case

This model is intended for developers and researchers building LLMs.

Release Date

Hugging Face - 06/04/2026 via Hugging Face

Reference(s)

Model Architecture

  • Architecture Type: Mamba2-Transformer Hybrid Latent Mixture of Experts (LatentMoE) with Multi-Token Prediction (MTP)
  • Network Architecture: Nemotron Hybrid LatentMoE
  • Number of model parameters: 550B Total / 55B Active

Model Design

The model was pre-trained with around 20T tokens and supports up to 1M context length. The pre-training phase used an NVFP4 recipe. It utilizes the LatentMoE architecture, where tokens are projected into a smaller latent dimension for expert routing and computation, improving accuracy per byte. The model includes Multi-Token Prediction (MTP) layers, which predict multiple future tokens to provide richer training signals and enable faster inference via speculative decoding.

Training Methodology

Stage 1: Pre-Training

  • NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16 model was pre-trained using an NVFP4 recipe with crawled and synthetic code, math, science, and general knowledge data.

NVIDIA-Nemotron-3-Ultra-550B-A55B-Base-BF16 model is a result of the above work.

Input

  • Input Type(s): Text
  • Input Format(s): String
  • **Input…

Excerpt shown — open the source for the full document.

Notability

notability 8.0/10

Large model release from NVIDIA with moderate traction