stepfun-ai/Step-3.5-Flash-Base
Captured source
source ↗Step 3.5 Flash Base
1. Introduction
Step 3.5 Flash (visit website) is our most capable open-source foundation model, engineered to deliver frontier reasoning and agentic capabilities with exceptional efficiency. We also open-sourced the training codebase (SteptronOss), with support for continue pretrain, SFT, RL (WIP), and evaluation (WIP), and will open-source the SFT data. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. This "intelligence density" allows it to rival the reasoning depth of top-tier proprietary models, while maintaining the agility required for real-time interaction.
2. Key Capabilities
- Deep Reasoning at Speed: While chatbots are built for reading, agents must reason fast. Powered by 3-way Multi-Token Prediction (MTP-3), Step 3.5 Flash achieves a generation throughput of 100–300 tok/s in typical usage (peaking at 350 tok/s for single-stream coding tasks). This allows for complex, multi-step reasoning chains with immediate responsiveness.
- A Robust Engine for Coding & Agents: Step 3.5 Flash is purpose-built for agentic tasks, integrating a scalable RL framework that drives consistent self-improvement. It achieves 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0, proving its ability to handle sophisticated, long-horizon tasks with unwavering stability.
- Efficient Long Context: The model supports a cost-efficient 256K context window by employing a 3:1 Sliding Window Attention (SWA) ratio—integrating three SWA layers for every full-attention layer. This hybrid approach ensures consistent performance across massive datasets or long codebases while significantly reducing the computational overhead typical of standard long-context models.
- Accessible Local Deployment: Optimized for accessibility, Step 3.5 Flash brings elite-level intelligence to local environments. It runs securely on high-end consumer hardware (e.g., Mac Studio M4 Max, NVIDIA DGX Spark), ensuring data privacy without sacrificing performance.
3. Performance
Step 3.5 Flash delivers performance parity with leading closed-source systems while remaining open and efficient.

Performance of Step 3.5 Flash measured across Reasoning, Coding, and Agentic Abilities. Open-source models (left) are sorted by their total parameter count, while top-tier proprietary models are shown on the right. xbench-DeepSearch scores are sourced from official publications for consistency. The shadowed bars represent the enhanced performance of Step 3.5 Flash using Parallel Thinking.
Detailed Benchmarks
| Benchmark | # Shots | Step3.5 Flash (Base) | MiMo‑V2 Flash (Base) | GLM‑4.5 (Base) | DeepSeek V3.1 (Base) | DeepSeek V3.2 (Exp Base) | Kimi‑K2 (Base) | | --- | --- | --- | --- | --- | --- | --- | --- | | # Activated Params | - | 11B | 15B | 32B | 37B | 37B | 32B | | # Total Params | - | 196B | 309B | 355B | 671B | 671B | 1043B | | General | | | | | | | | | BBH | 3-shot | 88.2 | 88.5 | 86.2 | 88.2† | 88.7† | 88.7 | | MMLU | 5-shot | 85.8 | 86.7 | 86.1 | 87.4† | 87.8† | 87.8 | | MMLU‑Redux | 5-shot | 89.2 | 90.6 | - | 90.0† | 90.4† | 90.2 | | MMLU‑Pro | 5-shot | 62.3 | 73.2 | - | 58.8† | 62.1† | 69.2 | | HellaSwag | 10-shot | 90.2 | 88.5 | 87.1 | 89.2† | 89.4† | 94.6 | | WinoGrande | 5-shot | 79.1 | 83.8 | - | 85.9† | 85.6† | 85.3 | | GPQA | 5-shot | 41.7 | 43.5* | 33.5* | 43.1* | 37.3* | 43.1* | | SuperGPQA | 5-shot | 41.0 | 41.1 | - | 42.3† | 43.6† | 44.7 | | SimpleQA | 5-shot | 31.6 | 20.6 | 30.0 | 26.3† | 27.0† | 35.3 | | Mathematics | | | | | | | | | GSM8K | 8-shot | 88.2 | 92.3 | 87.6 | 91.4† | 91.1† | 92.1 | | MATH | 4-shot | 66.8 | 71.0 | 62.6 | 62.6† | 62.5† | 70.2 | | Code | | | | | | | | | HumanEval | 3-shot | 81.1 | 77.4* | 79.8* | 72.5* | 67.7* | 84.8* | | MBPP | 3-shot | 79.4 | 81.0* | 81.6* | 74.6* | 75.6* | 89.0* | | HumanEval+ | 0-shot | 72.0 | 70.7 | - | 64.6† | 67.7† | - | | MBPP+ | 0-shot | 70.6 | 71.4 | - | 72.2† | 69.8† | - | | MultiPL‑E HumanEval | 0-shot | 67.7 | 59.5 | - | 45.9† | 45.7† | 60.5 | | MultiPL‑E MBPP | 0-shot | 58.0 | 56.7 | - | 52.5† | 50.6† | 58.8 | | Chinese | | | | | | | | | C‑EVAL | 5-shot | 89.6 | 87.9 | 86.9 | 90.0† | 91.0† | 92.5 | | CMMLU | 5-shot | 88.9 | 87.4 | - | 88.8† | 88.9† | 90.9 | | C‑SimpleQA | 5-shot | 63.2 | 61.5 | 70.1 | 70.9† | 68.0† | 77.6 |
1. “*” denotes cases where the original score was unavailable; we report results evaluated under the same test conditions as Step3.5 Flash for fair comparison. 2. “†” indicates DeepSeek scores quoted from the MiMo‑V2‑Flash report.
Recommended Inference Parameters
1. For general chat domain, we suggest: temperature=0.6, top_p=0.95 2. For reasoning / agent scenario, we recommend: temperature=1.0, top_p=0.95.
4. Architecture Details
Step 3.5 Flash is built on a Sparse Mixture-of-Experts (MoE) transformer architecture, optimized for high throughput and low VRAM usage during inference.
4.1 Technical Specifications
| Component | Specification | | :--- | :--- | | Backbone | 45-layer Transformer (4,096 hidden dim) | | Context Window | 256K | | Vocabulary | 128,896 tokens | | Total Parameters | 196.81B (196B Backbone + 0.81B Head) | | Active Parameters | ~11B (per token generation) |
4.2 Mixture of Experts (MoE) Routing
Unlike traditional dense models, Step 3.5 Flash uses a fine-grained routing strategy to maximize efficiency:
- Fine-Grained Experts: 288 routed experts per layer + 1 shared expert (always active).
- Sparse Activation: Only the Top-8 experts are selected per token.
- Result: The model retains the "memory" of a 196B parameter model but executes with the speed of an 11B model.
4.3 Multi-Token Prediction (MTP)
To improve inference speed, we utilize a specialized MTP Head consisting of a sliding-window attention mechanism and a dense Feed-Forward Network (FFN). This module predicts 4 tokens simultaneously in a single forward pass, significantly accelerating inference without degrading quality.
5. Training Codebase
The training codebase for Step 3.5 Flash is available at SteptronOss.
📜 Citation
If you find this project useful in your research,…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Low traction model release