What does this repo signal mean?

StepFun published stepfun-ai/Step-3.5-Flash (C++). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo stepfun-ai/Step-3.5-Flash · language C++ · Notable flash model release. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

StepFun Repo: stepfun-ai/Step-3.5-Flash

Captured source

source ↗

GitHub/github.com/stepfun-ai/Step-3.5-Flash

stepfun-ai/Step-3.5-Flash repository metadata

Source ↗

published Jan 31, 2026seen Jun 5captured Jun 11http 200method plain

stepfun-ai/Step-3.5-Flash

Description: Fast, Sharp & Reliable Agentic Intelligence

Language: C++

License: Apache-2.0

Stars: 2076

Forks: 84

Open issues: 18

Created: 2026-01-31T02:57:03Z

Pushed: 2026-04-03T12:50:32Z

Default branch: main

Fork: no

Archived: no

README:

English | 简体中文

OpenClaw Guide | Claude Code Guide | Roo Code Guide | Local Agent Guide

1. Introduction

Step 3.5 Flash (visit website) is our most capable open-source foundation model, engineered to deliver frontier reasoning and agentic capabilities with exceptional efficiency. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. This "intelligence density" allows it to rival the reasoning depth of top-tier proprietary models, while maintaining the agility required for real-time interaction.

[Key Capabilities](#2-key-capabilities)
[Performance](#3-performance)
[Architecture Details](#4-architecture-details)
[Quick Start](#5-quick-start)
[Local Deployment](#6-local-deployment)
[Using Step 3.5 Flash on Agent Platforms](#7-using-step-35-flash-on-agent-platforms)
[Cookbooks](#8-cookbooks)
[Known Issues and Future Directions](#9-known-issues-and-future-directions)
[Co-Developing the Future](#10-co-developing-the-future)
[License](#license)

2. Key Capabilities

Deep Reasoning at Speed: While chatbots are built for reading, agents must reason fast. Powered by 3-way Multi-Token Prediction (MTP-3), Step 3.5 Flash achieves a generation throughput of 100–300 tok/s in typical usage (peaking at 350 tok/s for single-stream coding tasks). This allows for complex, multi-step reasoning chains with immediate responsiveness.

A Robust Engine for Coding & Agents: Step 3.5 Flash is purpose-built for agentic tasks, integrating a scalable RL framework that drives consistent self-improvement. It achieves 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0, proving its ability to handle sophisticated, long-horizon tasks with unwavering stability.

Efficient Long Context: The model supports a cost-efficient 256K context window by employing a 3:1 Sliding Window Attention (SWA) ratio—integrating three SWA layers for every full-attention layer. This hybrid approach ensures consistent performance across massive datasets or long codebases while significantly reducing the computational overhead typical of standard long-context models.

Accessible Local Deployment: Optimized for accessibility, Step 3.5 Flash brings elite-level intelligence to local environments. It runs securely on high-end consumer hardware (e.g., Mac Studio M4 Max, NVIDIA DGX Spark), ensuring data privacy without sacrificing performance.

3. Performance

Step 3.5 Flash delivers performance parity with leading closed-source systems while remaining open and efficient.

![](assets/step-bar-chart.png)

Performance of Step 3.5 Flash measured across Reasoning, Coding, and Agentic Capabilities. Open-source models (left) are sorted by their total parameter count, while top-tier proprietary models are shown on the right. xbench-DeepSearch scores are sourced from official publications for consistency. The shadowed bars represent the enhanced performance of Step 3.5 Flash using Parallel Thinking.

Detailed Benchmarks

| Benchmark | Step 3.5 Flash | DeepSeek V3.2 | Kimi K2 Thinking / K2.5 | GLM-4.7 | MiniMax M2.1 | MiMo-V2 Flash | | --- | --- | --- | --- | --- | --- | --- | | # Activated Params | 11B | 37B | 32B | 32B | 10B | 15B | | # Total Params (MoE) | 196B | 671B | 1T | 355B | 230B | 309B | | Est. decoding cost @ 128K context, Hopper GPU | 1.0x 100 tok/s, MTP-3, EP8 | 6.0x 33 tok/s, MTP-1, EP32 | 18.9x 33 tok/s, no MTP, EP32 | 18.9x 100 tok/s, MTP-3, EP8 | 3.9x 100 tok/s, MTP-3, EP8 | 1.2x 100 tok/s, MTP-3, EP8 | | | | | Agent** | | | | | τ²-Bench | 88.2 | 80.3 (85.2*) | 74.3*/85.4* | 87.4 | 86.6* | 80.3 (84.1*) | | BrowseComp | 51.6 | 51.4 | 41.5* / 60.6 | 52.0 | 47.4 | 45.4 | | BrowseComp (w/ Context Manager) | 69.0 | 67.6 | 60.2/74.9 | 67.5 | 62.0 | 58.3 | | BrowseComp-ZH | 66.9 | 65.0 | 62.3 / 62.3* | 66.6 | 47.8* | 51.2* | | BrowseComp-ZH (w/ Context Manager) | 73.7 | — | —/— | — | — | — | | GAIA (no file) | 84.5 | 75.1* | 75.6*/75.9* | 61.9* | 64.3* | 78.2* | | xbench-DeepSearch (2025.05) | 83.7 | 78.0* | 76.0*/76.7* | 72.0* | 68.7* | 69.3* | | xbench-DeepSearch (2025.10) | 56.3 | 55.7* | —/40+ | 52.3* | 43.0* | 44.0* | | ResearchRubrics | 65.3 | 55.8* | 56.2*/59.5* | 62.0* | 60.2* | 54.3* | | | | | Reasoning | | | | | AIME 2025 | 97.3 | 93.1 | 94.5/96.1 | 95.7 | 83.0 | 94.1 (95.1*) | | HMMT 2025 (Feb.) | 98.4 | 92.5 | 89.4/95.4 | 97.1 | 71.0* | 84.4 (95.4*) | | HMMT 2025 (Nov.) | 94.0 | 90.2 | 89.2*/— | 93.5 | 74.3* | 91.0* | | IMOAnswerBench | 85.4 | 78.3 | 78.6/81.8 | 82.0 | 60.4* | 80.9* | | | | | Coding | | | | | LiveCodeBench-V6 | 86.4 | 83.3 | 83.1/85.0 | 84.9 | — | 80.6 (81.6*) | | SWE-bench Verified | 74.4 | 73.1 | 71.3/76.8 | 73.8 | 74.0 | 73.4 | | Terminal-Bench 2.0 | 51.0 | 46.4 | 35.7*/50.8 | 41.0 | 47.9 | 38.5 |

Notes

"—" indicates the score is not publicly available or not tested.
"*" indicates the original score was inaccessible or lower than our reproduced, so we report the evaluation under the same test conditions as Step 3.5 Flash to ensure fair comparability.
BrowseComp (with Context Manager): when the effective context length exceeds a predefined threshold, the agent resets the context and restarts the agent loop. (By contrast, Kimi K2.5 and DeepSeek-V3.2 used a discard-all strategy.)
In decoding cost section, decoding is estimated using a similar but more accurate approach than arxiv.org/abs/2507.19427.

4. Architecture Details

Step 3.5 Flash is built on a Sparse Mixture-of-Experts (MoE) transformer architecture, optimized for high throughput and low VRAM usage during inference.

4.1 Technical Specifications

4.2 Mixture of Experts (MoE)...

Excerpt shown — open the source for the full document.

Notability

notability 7.0/10

Notable flash model release