stepfun-ai/StepDeepResearch
Python
Captured source
source ↗stepfun-ai/StepDeepResearch
Description: Step-DeepResearch
Language: Python
License: Apache-2.0
Stars: 561
Forks: 26
Open issues: 3
Created: 2025-11-26T06:02:31Z
Pushed: 2026-03-24T07:08:07Z
Default branch: main
Fork: no
Archived: no
README:
Step-DeepResearch
News
- Feb 2, 2026: 👋 We have released Step 3.5 Flash, achieving 65.27 on ResearchRubrics. Try it out by setting the environment variable
MODEL_NAME=step-3.5-flash. Details
- Dec 25, 2025: 👋 You can join our group chat to get updates on your beta API application status and the latest project developments.
- Dec 24, 2025: 👋 We have made our technical report available. Read
Introduction
Model Summary
Step-DeepResearch is a cost-effective, end-to-end deep research agent model designed for autonomous information exploration and professional report generation in open-ended research scenarios.
- Atomic Capability Integration: By decomposing complex research tasks into trainable atomic capabilities—including planning, information seeking, reflection and cross-validation, and professional report generation—and achieving deep internalization at the model level, the system ensures closed-loop reflection and dynamic correction within a single inference pass.
- Progressive Training Pipeline: We establish a complete optimization path from Agentic Mid-Training to Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), reshaping the training objective from "predicting the next token" to "deciding the next atomic action." This approach effectively enhances the model's adaptive capabilities and generalization performance in complex environments.
- Strong Performance Across Model Scales: With only 32B parameters, Step-DeepResearch achieves 61.4% on Scale AI ResearchRubrics, matching OpenAI Deep Research and Gemini Deep Research. In expert human evaluations on ADR-Bench, its Elo score significantly outperforms larger models including DeepSeek-v3.2 and GLM-4.6, and rivals top-tier closed-source models.
- New model (Step 3.5 Flash): Step 3.5 Flash reaches 65.27 on ResearchRubrics, delivering research quality that competes with OpenAI and Gemini Deep Research while maintaining significantly higher inference efficiency. Details
- Superior Cost-Effectiveness: With extremely low deployment and inference costs while maintaining expert-level research capabilities, Step-DeepResearch stands as the most cost-effective deep research agent solution currently available in the industry.
- Access: Available via StepFun Open Platform API, free for the first month.
System Architecture
Step-DeepResearch adopts a single-agent architecture based on the ReAct paradigm, enabling autonomous deep research through a dynamic cycle of reasoning, action, and reflection.
Building upon our internal proprietary toolset, we provide a streamlined toolset with local implementation in the framework, including batch_web_surfer (batch web search and browsing), file (file reading, writing, and editing), todo (task state management), and shell (interactive command execution), to support the complete research workflow. For a comprehensive introduction to our full toolset, please refer to Section 5.2 of the technical report.
Case
We provide a sample case below, including the query and the model-generated report.
Query
我是个 AI 产品经理,正在考虑给产品增加个性化记忆功能。想系统了解一下现在 Agent 里的 Memory 模块都在往哪些方向发展,包括短期和长期记忆是怎么做的、各种技术路线的差异和取舍。你帮我写个调研报告,从工程落地和未来演进的角度分析一下 哪些方向更值得投入
Model Output (Report)
Click to expand the report
AI Agent记忆模块深度调研报告:技术演进与产品化路径
一、引言与核心问题界定
1.1 记忆能力为何成为Agent时代的分水岭
在Agentic AI加速渗透的2025年,智能体已从"能对话、会调用工具"的基础形态,向"具备长期记忆、可自主进化"的高阶阶段迈进大模型进阶之路:AI Agent记忆能力构建技术详解(值得收藏)。正如OpenAI CEO山姆·奥尔特曼在2025年12月的Big Technology访谈中所言:"这是整个系统里我个人最期待的部分之一。AI的下一个重大飞跃并非是更敏锐的推理能力,而是更为根本的记忆"超级Agent重要拼图?奥尔特曼点名“AI记忆” 存储环节迎来新叙事。他进一步指出,"人类本身是有局限的:即使你拥有世界上最好的私人助理,他们也不可能记住你说过的每一句话,不可能读过你的每一封邮件,不可能看过你写的每一份文件"——而这正是AI能够做到的超级Agent重要拼图?奥尔特曼点名“AI记忆” 存储环节迎来新叙事。
当前,大多数人以为通过更大的上下文窗口或巧妙的提示词工程,AI就拥有了"记忆",但真相是,大多数AI Agent仍是无状态的,无法从过去的交互中学习,也无法随时间适应用户需求探寻AI Agent 中隐秘的角落:记忆(Memory) - 定义、价值与实践。要从一次性工具迈向真正智能的伙伴,我们需要为AI赋予真正的记忆,而非仅仅依赖更大的提示或更强的检索。
1.2 记忆的本质定义
在AI Agent中,记忆是指跨时间、任务和多次用户交互,保留并调用相关信息的能力探寻AI Agent 中隐秘的角落:记忆(Memory) - 定义、价值与实践。它让AI能记住过去发生的事,并利用这些信息优化未来的行为。记忆不是简单地存储聊天记录,也不是把更多数据塞进提示框——它是一种持久的内部状态,随着每次交互不断进化,哪怕间隔数周或数月,依然能为AI提供连续的上下文探寻AI Agent 中隐秘的角落:记忆(Memory) - 定义、价值与实践。
记忆的三大支柱包括:状态(了解当前情境,掌握正在发生的事情)、持久性(跨会话保留知识,确保信息不因对话结束而丢失)、选择性(判断哪些信息值得记住,哪些可以忽略)探寻AI Agent 中隐秘的角落:记忆(Memory) - 定义、价值与实践。这三者共同赋予AI一种前所未有的能力——连续性。
1.3 报告研究范围与目标
本报告旨在系统梳理Agent记忆模块的技术演进方向,深入分析短期与长期记忆的实现机制、各类技术路线的差异与取舍,并从业务价值、技术成熟度和工程可行性三个维度,为产品团队提供明确的技术选型建议和投入优先级排序。
---
二、Agent记忆模块的分类体系与核心架构
2.1 从认知科学到工程实践的记忆分层
人类记忆遵循从感觉记忆到短期记忆再到长期记忆的一般性进程A survey on large language model based autonomous agents。当设计Agent记忆结构时,研究者从中汲取灵感:
短期记忆(Working Memory):指AI正在进行的对话、脑中即时活跃的上下文。它容量有限(通常为数千tokens),但访问速度极快从理论到落地:分层记忆架构在AI Agent中的应用实践。例如,在MemGPT架构中,短期工作上下文由系统指令、工作上下文和FIFO队列组成9.4k Star!MemGPT:伯克利大学最新开源、将LLM作为操作系统、无限上下文记忆、服务化部署自定义Agent。
中期记忆(Episodic Memory):指最近读完的书籍核心内容或近期发生的重大事件。它们比短期记忆更持久,但不如长期记忆根深蒂固从理论到落地:分层记忆架构在AI Agent中的应用实践。这类记忆通常通过向量数据库或结构化存储实现语义检索。
长期记忆(Long-term Memory):指用户的个人经历、学到的技能、世界观以及那些已掌握的知识。它容量近乎无限,但检索可能需要更长时间从理论到落地:分层记忆架构在AI Agent中的应用实践。长期记忆是Agent个性化和持续学习的基础。
2.2 三种记忆形式的技术实现
根据2025年最新综述研究,Agent记忆可从形式(Forms)、功能(Functions)和动态(Dynamics)三个正交维度进行重构2025年Memory最全综述!AI Agent记忆统一分类体系:
**Token级记忆(Token-level…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10New research repo with moderate traction.