microsoft/FastContext-1.0-4B-SFT
Captured source
source ↗1. Model Introduction
FastContext-1.0 is a lightweight repository-exploration subagent for LLM coding agents. Instead of letting a single model both explore the repository and solve the task, FastContext separates these two roles: it is invoked on demand by a main coding agent, issues parallel read-only tool calls (READ, GLOB, GREP), and returns compact file paths and line ranges as focused context.
Repository exploration is a major bottleneck in modern coding agents — locating relevant code consumes a large share of the token budget and pollutes the solver's context with irrelevant snippets. In our analysis of GPT-5.4 trajectories, reading and searching account for 56.2% of all tool-use turns and 46.5% of the main agent's total tokens. FastContext moves this work into a dedicated subagent so the main agent receives clean, grounded evidence rather than the long trail of exploratory reads and searches.
The model family spans 4B–30B parameters, bootstrapped from strong reference-model trajectories via supervised fine-tuning (SFT) and refined with task-grounded reinforcement learning (RL) for broad first-turn search, multi-turn evidence gathering, and precise citation generation.
- Backbones: Qwen3-4B-Instruct (4B explorer) and Qwen3-Coder-30B-A3B (30B explorer)
- Variants:
FC-4B-SFT,FC-4B-RL(deployment targets),FC-30B-SFT(scaling reference) - Context length: up to 262K tokens
- Paper: *FastContext: Training Efficient Repository Explorer for Coding Agents*
- Code & data: https://github.com/microsoft/fastcontext
How it works
Coding Agent ──query──▶ FastContext ──read/search──▶ Repository ▲ │ └──── file-line ────────┘ citations
Internally, FastContext runs an exploration loop:
1. Query understanding — translate the issue into search intents. 2. Parallel tool calling — issue multiple READ / GLOB / GREP calls in a single turn to cover complementary hypotheses. 3. Observation-driven refinement — use tool outputs to guide the next search turn. 4. Final citations — return a compact `` block of file paths and line ranges.
2. Evaluation Results
End-to-end performance (Mini-SWE-Agent)
Integrating FastContext into Mini-SWE-Agent improves end-to-end resolution rates by up to 5.5% while reducing main-agent token consumption by up to 60%, with only marginal overhead. Scores, tokens, and turns are measured on the main-agent trajectory; deltas are relative to w/o Explore for the same main agent.
| Main Agent | Subagent | SWE-bench Multilingual | SWE-bench Pro | SWE-QA | |---|---|---|---|---| | GPT-5.4 | w/o Explore | 71.7 / 457k | 46.0 / 818k | 81.3 / 418k | | | FC-30B-SFT | 75.0 (↑3.3) / 356k (↓22.1%) | 49.0 (↑3.0) / 688k (↓15.9%) | 82.0 (↑0.7) / 206k (↓50.7%) | | | FC-4B-SFT | 73.3 (↑1.6) / 364k (↓20.4%) | 47.0 (↑1.0) / 689k (↓15.8%) | 81.9 (↑0.6) / 213k (↓49.0%) | | | FC-4B-RL | 74.7 (↑3.0) / 338k (↓26.0%) | 48.5 (↑2.5) / 701k (↓14.3%) | 82.0 (↑0.7) / 210k (↓49.8%) | | GLM-5.1 | w/o Explore | 72.3 / 2514k | 17.5 / 2692k | 72.7 / 401k | | | FC-30B-SFT | 73.7 (↑1.4) / 1797k (↓28.5%) | 20.0 (↑2.5) / 2370k (↓12.0%) | 73.3 (↑0.6) / 292k (↓27.2%) | | | FC-4B-SFT | 73.3 (↑1.0) / 1919k (↓23.7%) | 18.0 (↑0.5) / 2279k (↓15.3%) | 73.4 (↑0.7) / 306k (↓23.7%) | | | FC-4B-RL | 73.7 (↑1.4) / 1971k (↓21.6%) | 22.5 (↑5.0) / 2210k (↓17.9%) | 73.5 (↑0.8) / 302k (↓24.7%) | | Kimi-K2.6 | w/o Explore | 76.3 / 1553k | 31.0 / 2383k | 71.6 / 510k | | | FC-30B-SFT | 76.7 (↑0.4) / 1360k (↓12.4%) | 33.0 (↑2.0) / 2150k (↓9.8%) | 72.8 (↑1.2) / 373k (↓26.9%) | | | FC-4B-SFT | 75.3 (↓1.0) / 1306k (↓15.9%) | 32.5 (↑1.5) / 2159k (↓9.4%) | 72.6 (↑1.0) / 402k (↓21.2%) | | | FC-4B-RL | 78.3 (↑2.0) / 1384k (↓10.9%) | 33.5 (↑2.5) / 2158k (↓9.4%) | 72.6 (↑1.0) / 378k (↓25.9%) |
*Score / Tokens shown per cell. Best result per main-agent block in bold.*
Highlights:
- FastContext improves end-to-end accuracy for every main agent and benchmark; the largest gains appear on SWE-bench Pro (e.g. GPT-5.4 +5.5, GLM-5.1 +5.0).
- The biggest token savings reach 60.3% (GPT-5.4 on SWE-QA).
- The compact 4B-RL explorer can outperform the larger 30B-SFT explorer — e.g. on GLM-5.1 SWE-bench Pro it reaches 22.5 vs. 20.0 while using fewer tokens.
3. Quick Start
Launch the model with an OpenAI-compatible server (e.g. SGLang). The example below serves the 4B explorer:
python3 -m sglang.launch_server \ --model-path FastContext-1.0-4B-SFT \ --tool-call-parser qwen \ --context-length 262144 \ --trust-remote-code \ --dtype bfloat16 \ --host 0.0.0.0 \ --port 30000 \ --tp-size 1 \ --mem-fraction-static 0.8
FastContext exposes only three read-only tools to the model:
| Tool | Purpose | |---|---| | READ | Return line-numbered file contents | | GLOB | Path discovery by glob pattern | | GREP | Regex search over repository text (ripgrep-style) |
At each turn the explorer either issues one or more (parallel) tool calls or stops with a final `` evidence list. Wire FastContext into a coding agent (e.g. Mini-SWE-Agent) as an exploration subagent the main agent can invoke on demand.
4. Training Recipe
FastContext is trained in two stages:
- Supervised fine-tuning (SFT): The exploration traces, split into three sources matching the runtime behavior of the subagent —
parallel_toolcalls(broad first-turn search),multiturn_traj(multi-turn evidence gathering), andlinerange(precise citation generation). - Reinforcement learning (RL): The model is rolled out as the actual subagent and optimized with GRPO using a deterministic reward combining file- and line-level F1, a bonus for bounded parallel exploration, and format penalties.
License
This project is licensed under the MIT License.
Notability
notability 5.0/10Solid model release, not flagship.