NousResearch/Nemotron
forked from NVIDIA-NeMo/Nemotron
Captured source
source ↗NousResearch/Nemotron
Description: Developer Asset Hub for NVIDIA Nemotron — A one-stop resource for training recipes, usage cookbooks, datasets, and full end-to-end reference examples to build with Nemotron models
License: Apache-2.0
Stars: 12
Forks: 1
Open issues: 0
Created: 2026-04-27T14:08:47Z
Pushed: 2026-05-27T12:38:33Z
Default branch: main
Fork: yes
Parent repository: NVIDIA-NeMo/Nemotron
Archived: no
README:
NVIDIA Nemotron Developer Repository
Open and efficient models for agentic AI. Training recipes, deployment guides, and use-case examples for the Nemotron family.
---
> 🎉Nemotron 3 Ultra was announced at GTC San Jose 2026\. To learn more, [see the usage guide](./usage-cookbook/Nemotron-3-Ultra-Base/README.md)\!
---
Why Nemotron?
| | | |---|---| | Open Models | Fully transparent training data, techniques, and weights for community innovation | | Compute Efficiency | Model pruning and optimization enabling higher throughput via TensorRT-LLM | | High Accuracy | Built on frontier open models with human-aligned reasoning for agentic workflows | | Flexible Deployment | Deploy anywhere: edge, single GPU, or data center with NIM microservices |
---
Repository Overview
nemotron/ │ ├── src/nemotron/recipes/ Training recipes (complete, reproducible pipelines) │ ├── usage-cookbook/ Usage cookbooks (deployment and model usage guides) │ └── use-case-examples/ Examples of leveraging Nemotron in agentic workflows
Which section should I use?
| | Training Recipes | Usage Cookbooks | Use Case Examples | |---|---|---|---| | Purpose | Reproduce full training pipelines from raw data to model | Deploy and use trained models | Build end-to-end applications | | Format | Python packages with configs, scripts, and evaluation | Jupyter notebooks with step-by-step guides | Jupyter notebooks and scripts | | When to use | You want to train, fine-tune, or understand how a model was built | You have a model and want to deploy or run inference | You want to build an application (RAG, agents, tool use) | | Location | [src/nemotron/recipes/](./src/nemotron/recipes/) | [usage-cookbook/](./usage-cookbook/) | [use-case-examples/](./use-case-examples/) |
---
What is Nemotron?
NVIDIA Nemotron is a family of open, high-efficiency multimodal models purpose-built for agentic AI.
Model Tiers:
- Nano — Optimized for edge and PC deployments
- Super — Single GPU deployment with highest throughput
- Ultra — Multi-GPU datacenter applications
Nemotron models excel at coding, math, scientific reasoning, tool calling, instruction following, and visual reasoning. Deploy across edge, single GPU, or data center environments with support for NeMo, TensorRT-LLM, vLLM, SGLang, and NIM microservices.
---
Training Recipes
The Nemotron respository provides reproducible training pipelines from raw data to deployment-ready models. These implementations reflect how large language models are actually trained: careful experimentation, validation gates, and systematic optimization.
Why Complete Pipelines?
Training a production model involves interconnected components. Isolated examples miss how stages interact. Complete pipelines show:
- How data quality affects downstream performance across pretraining, SFT, and RL
- Which training techniques actually work together, not just in theory
- Where validation gates prevent failures and maintain reproducibility
- How to balance competing objectives across stages
Because these are complete systems, you can extract specific techniques with confidence. Each component has been proven to work in context.
Each Recipe Includes
- 🎨 Synthetic Data Generation - Scripts to generate synthetic datasets using NVIDIA-NeMo/DataDesigner
- 🗂️ Data Curation - Scripts to prepare training data using NVIDIA NeMo Curator for scalable data processing, filtering, and quality enhancement
- 🔁 Training - Complete training loops with hyperparameters using:
- NVIDIA-NeMo/Megatron-Bridge for Megatron models
- NVIDIA-NeMo/Automodel for HuggingFace models
- NVIDIA-NeMo/NeMo-RL when RL is needed
- Includes GPU-accelerated last-mile data processing (tokenization + optional sequence packing) for optimal training efficiency
- 📊 Evaluation - Benchmark evaluation on standard suites using NVIDIA NeMo Evaluator
- 📖 Documentation - Detailed explanations of each stage
Available Recipes
| Model | Description | Stages | Guide | |-------|-------------|--------|-------| | [Nemotron 3 Super](docs/nemotron/super3/README.md) | 120.6B total / 12.7B active Hybrid Mamba Latent MoE Transformer for frontier reasoning, coding, and agentic tasks | Pretrain → SFT → RL | [Training Guide](docs/nemotron/super3/README.md) | | [Nemotron 3 Nano](docs/nemotron/nano3/README.md) | 31.6B total / 3.6B active MoE Hybrid Mamba-Transformer for agentic reasoning | Pretrain → SFT → RL | [Training Guide](docs/nemotron/nano3/README.md) |
Nemotron 3 Super
A complete training recipe for the frontier Hybrid Mamba Latent Mixture-of-Experts Transformer model with state-of-the-art reasoning, coding, and agentic capabilities.
> Open-Source Data Only: These recipes train exclusively on the open-sourced subset of training data. Results will differ from the tech report benchmarks, which used additional proprietary data. Use these recipes as reference implementations to apply the methodology with your own data.
Model Specifications:
- 120B total / 12B active parameters
- Multi-stage RL pipeline: 3× RLVR + 2× SWE-RL + RLHF across 21 reward environments
- Asynchronous GRPO with decoupled training and inference
What You Can Extract:
- Large-scale pretraining with data curriculum
- Multi-domain SFT pipeline
- Multi-environment RLVR with 21 simultaneous reward environments
- SWE-RL with container-isolated sandbox execution
- GenRM-based RLHF with principle-following rewards
- Asynchronous GRPO at 1K GPU scale
Resources:
- [Training Guide](docs/nemotron/super3/README.md)
- [Tech…
Excerpt shown — open the source for the full document.
Notability
notability 1.0/10Routine fork, low stars