stepfun-ai/NextStep-1
Python
Captured source
source ↗stepfun-ai/NextStep-1
Description: [🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s Multimodal Intelligence team.
Language: Python
License: Apache-2.0
Stars: 689
Forks: 26
Open issues: 0
Created: 2025-08-14T08:50:25Z
Pushed: 2026-02-27T17:05:44Z
Default branch: main
Fork: no
Archived: no
README:
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
> Autoregressive models—generating content step-by-step like reading a sentence—excel in language but struggle with images. Traditionally, they either depend on costly diffusion models or compress images into discrete, lossy tokens via vector quantization (VQ). > > NextStep-1 takes a different path: a 14B-parameter autoregressive model that works directly with continuous image tokens, preserving the full richness of visual data. It models sequences of discrete text tokens and continuous image tokens jointly—using a standard LM head for text and a lightweight 157M-parameter flow matching head for visuals. This unified next-token prediction framework is simple, scalable, and capable of producing stunningly detailed images.
🔥 News
- Feb. 25, 2026: vLLM-Omni supports high performance inference of NextStep-1.1. Please check here for details!
- Feb. 16, 2026: The training code of NextStep-1 (this repo) and the post-training blogs of NextStep-1.1 (link) have been released. Welcome to discuss and contribute. Happy Chinese New Year!
- Feb. 6, 2026: NextStep-1 has been selected as Oral Presentation by ICLR 2026! 🎉🎉🎉
- Dec. 24, 2025: 🔥 We release NextStep-1.1, a text-to-image model that substantially elevates output quality through extended training and a Flow-based Reinforcement Learning (RL) post-training paradigm. Feel free to try with checkpoints hosted on our HF repo!
Checkpoints are available on:
- 🤗 Hugging Face:
- Pretrain: NextStep-1.1-Pretrain
- Post-train: NextStep-1.1
- 🇨🇳 ModelScope:
- Pretrain: NextStep-1.1-Pretrain
- Post-train: NextStep-1.1
- Aug. 18, 2025: 👋 We deploy NextStep-1-Large-Edit on HuggingFace Spaces. Feel free to try it out!
- Aug. 18, 2025: 👋 We open the [WeChat Group](./assets/wechat.png). Feel free to join us!
- Aug. 14, 2025: 👋 We release the inference code and huggingface model weights of NextStep-1-Large-Pretrain, NextStep-1-Large and NextStep-1-Large-Edit
- Aug. 14, 2025: 👋 We have made our technical report available as open source.
---
📑 Table of Contents
- [🔥 News](#-news)
- [📦 Installation & Environment](#-installation--environment)
- [📥 Model & Data Preparation](#-model--data-preparation)
- [2.1 Download Model Weights](#21-download-model-weights)
- [2.2 Download Training Datasets](#22-download-training-datasets)
- [2.3 Process Custom Data (Optional)](#23-process-custom-data-optional)
- [🚀 Training](#-training)
- [3.1 Start Training (via
smartrun)](#31-start-training-via-smartrun) - [3.2 Override Training Parameters](#32-override-training-parameters)
- [3.3 Inspect and Compare Configurations](#33-inspect-and-compare-configurations)
- [🔮 Inference](#-inference)
- [4.1 Convert Checkpoint Format](#41-convert-checkpoint-format)
- [4.2 Run Inference](#42-run-inference)
- [📚 References](#-references)
- [📄 License](#-license)
- [📖 Citation](#-citation)
---
📦 Installation & Environment
1.1 Clone the Repository
git clone https://github.com/stepfun-ai/NextStep-1 cd NextStep-1
1.2 Create Conda Environment
conda create -n nextstep python=3.10 -y conda activate nextstep
1.3 Install Dependencies
> ⚠️ Note: Pre-installing PyTorch based on your CUDA version is recommended.
pip install uv uv pip install -e .
> ☕ Tip: This installation may take a while. Grab a cup of coffee and take a break! ☕
1.4 Built-in CLI Tools
The following CLI tools are available after installation:
- `smartrun`: An intelligent distributed launcher that automatically wraps
torchrunparameters. - `gen_meta`: Scans datasets to generate metadata indices (sample counts, checksums, etc.).
- `warmup_data`: Pre-warms and caches data indices to significantly speed up training startup.
- `eshow`: Inspect or compare experiment configurations.
- `singlegpu_debug` / `multigpu_debug`: Dedicated debug entries for remote attachment.
---
📥 Model & Data Preparation
2.1 Download Model Weights
Download models to ./nextstep_models. Please update the corresponding paths in nextstep/model_zoos.py.
bash download_models.sh
> ☕ Tip: This download may take a while. Grab a cup of coffee and take a break! ☕
Available Models
The following table lists all available models and their training stages:
| Model | Pre-Training 256px | Pre-Training 512px | Annealing | RL | Visual Diversity | Fine-Tunability | Hugging Face | |-------|-------------------|-------------------|----------|----|-----------|------------------|--------------|
> ⚠️ Note: The models of NextStep-1 series are from the old version. Their performance is not as good as NextStep-1.1, so we do not recommend using them. Please use NextStep-1.1 series models instead.
> 💡 Quick Inference: If you want to quickly inference the model, refer to the inference script below.
python3 inference/inference.py
2.2 Download Training Datasets
Download datasets to ./nextstep_data.
bash download_datasets.sh
> ☕ Tip: This download may take a while. Grab a cup of coffee and take a break! ☕
> ⚠️ Important Note: The datasets provided in download_datasets.sh are only example open-source datasets for demonstration purposes. NextStep's actual training utilized approximately **1…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10Notable repo with 687 stars, moderate traction.