What does this fork signal mean?

Nous Research forked NousResearch/Nemotron (forked from NVIDIA-NeMo/Nemotron). This fork signal points to upstream code the lab may be inspecting, patching, or building on. High-signal details: repo NousResearch/Nemotron · parent NVIDIA-NeMo/Nemotron · Routine fork, low stars. onlylabs links this event to 1 captured evidence page and 6 related fork signals.

Nous Research Fork: NousResearch/Nemotron

Captured source

source ↗

GitHub/github.com/NousResearch/Nemotron

NousResearch/Nemotron repository metadata

Source ↗

published Apr 27, 2026seen Jun 6captured Jun 11http 200method plain

NousResearch/Nemotron

Description: Developer Asset Hub for NVIDIA Nemotron — A one-stop resource for training recipes, usage cookbooks, datasets, and full end-to-end reference examples to build with Nemotron models

License: Apache-2.0

Stars: 12

Forks: 1

Open issues: 0

Created: 2026-04-27T14:08:47Z

Pushed: 2026-05-27T12:38:33Z

Default branch: main

Fork: yes

Parent repository: NVIDIA-NeMo/Nemotron

Archived: no

README:

NVIDIA Nemotron Developer Repository

Open and efficient models for agentic AI. Training recipes, deployment guides, and use-case examples for the Nemotron family.

---

> 🎉Nemotron 3 Ultra was announced at GTC San Jose 2026\. To learn more, [see the usage guide](./usage-cookbook/Nemotron-3-Ultra-Base/README.md)\!

---

Why Nemotron?

| | | |---|---| | Open Models | Fully transparent training data, techniques, and weights for community innovation | | Compute Efficiency | Model pruning and optimization enabling higher throughput via TensorRT-LLM | | High Accuracy | Built on frontier open models with human-aligned reasoning for agentic workflows | | Flexible Deployment | Deploy anywhere: edge, single GPU, or data center with NIM microservices |

---

Repository Overview

nemotron/
│
├── src/nemotron/recipes/ Training recipes (complete, reproducible pipelines)
│
├── usage-cookbook/ Usage cookbooks (deployment and model usage guides)
│
└── use-case-examples/ Examples of leveraging Nemotron in agentic workflows

Which section should I use?

| | Training Recipes | Usage Cookbooks | Use Case Examples | |---|---|---|---| | Purpose | Reproduce full training pipelines from raw data to model | Deploy and use trained models | Build end-to-end applications | | Format | Python packages with configs, scripts, and evaluation | Jupyter notebooks with step-by-step guides | Jupyter notebooks and scripts | | When to use | You want to train, fine-tune, or understand how a model was built | You have a model and want to deploy or run inference | You want to build an application (RAG, agents, tool use) | | Location | [src/nemotron/recipes/](./src/nemotron/recipes/) | [usage-cookbook/](./usage-cookbook/) | [use-case-examples/](./use-case-examples/) |

---

What is Nemotron?

NVIDIA Nemotron is a family of open, high-efficiency multimodal models purpose-built for agentic AI.

Model Tiers:

Nano — Optimized for edge and PC deployments
Super — Single GPU deployment with highest throughput
Ultra — Multi-GPU datacenter applications

Nemotron models excel at coding, math, scientific reasoning, tool calling, instruction following, and visual reasoning. Deploy across edge, single GPU, or data center environments with support for NeMo, TensorRT-LLM, vLLM, SGLang, and NIM microservices.

---

Training Recipes

The Nemotron respository provides reproducible training pipelines from raw data to deployment-ready models. These implementations reflect how large language models are actually trained: careful experimentation, validation gates, and systematic optimization.

Why Complete Pipelines?

Training a production model involves interconnected components. Isolated examples miss how stages interact. Complete pipelines show:

How data quality affects downstream performance across pretraining, SFT, and RL
Which training techniques actually work together, not just in theory
Where validation gates prevent failures and maintain reproducibility
How to balance competing objectives across stages

Because these are complete systems, you can extract specific techniques with confidence. Each component has been proven to work in context.

Each Recipe Includes

🎨 Synthetic Data Generation - Scripts to generate synthetic datasets using NVIDIA-NeMo/DataDesigner
🗂️ Data Curation - Scripts to prepare training data using NVIDIA NeMo Curator for scalable data processing, filtering, and quality enhancement
🔁 Training - Complete training loops with hyperparameters using:
NVIDIA-NeMo/Megatron-Bridge for Megatron models
NVIDIA-NeMo/Automodel for HuggingFace models
NVIDIA-NeMo/NeMo-RL when RL is needed
Includes GPU-accelerated last-mile data processing (tokenization + optional sequence packing) for optimal training efficiency
📊 Evaluation - Benchmark evaluation on standard suites using NVIDIA NeMo Evaluator
📖 Documentation - Detailed explanations of each stage

Available Recipes

| Model | Description | Stages | Guide | |-------|-------------|--------|-------| | [Nemotron 3 Super](docs/nemotron/super3/README.md) | 120.6B total / 12.7B active Hybrid Mamba Latent MoE Transformer for frontier reasoning, coding, and agentic tasks | Pretrain → SFT → RL | [Training Guide](docs/nemotron/super3/README.md) | | [Nemotron 3 Nano](docs/nemotron/nano3/README.md) | 31.6B total / 3.6B active MoE Hybrid Mamba-Transformer for agentic reasoning | Pretrain → SFT → RL | [Training Guide](docs/nemotron/nano3/README.md) |

Nemotron 3 Super

A complete training recipe for the frontier Hybrid Mamba Latent Mixture-of-Experts Transformer model with state-of-the-art reasoning, coding, and agentic capabilities.

> Open-Source Data Only: These recipes train exclusively on the open-sourced subset of training data. Results will differ from the tech report benchmarks, which used additional proprietary data. Use these recipes as reference implementations to apply the methodology with your own data.

Model Specifications:

120B total / 12B active parameters
Multi-stage RL pipeline: 3× RLVR + 2× SWE-RL + RLHF across 21 reward environments
Asynchronous GRPO with decoupled training and inference

What You Can Extract:

Large-scale pretraining with data curriculum
Multi-domain SFT pipeline
Multi-environment RLVR with 21 simultaneous reward environments
SWE-RL with container-isolated sandbox execution
GenRM-based RLHF with principle-following rewards
Asynchronous GRPO at 1K GPU scale

Resources:

[Training Guide](docs/nemotron/super3/README.md)
[Tech...

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Routine fork, low stars