What does this repo signal mean?

StepFun published stepfun-ai/StepFun-Prover-Preview. This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo stepfun-ai/StepFun-Prover-Preview · Low stars, routine new repo. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

StepFun Repo: stepfun-ai/StepFun-Prover-Preview

Captured source

source ↗

GitHub/github.com/stepfun-ai/StepFun-Prover-Preview

stepfun-ai/StepFun-Prover-Preview repository metadata

Source ↗

published Aug 13, 2025seen Jun 5captured Jun 11http 200method plain

stepfun-ai/StepFun-Prover-Preview

Description: Large language models designed for formal theorem proving through tool-integrated reasoning.

License: Apache-2.0

Stars: 35

Forks: 1

Open issues: 0

Created: 2025-08-13T09:19:23Z

Pushed: 2025-08-13T10:39:32Z

Default branch: main

Fork: no

Archived: no

README:

Introduction

We introduce StepFun-Prover-Preview, large language models designed for formal theorem proving through tool-integrated reasoning. Using a reinforcement learning pipeline that incorporates tool-based interactions, StepFun-Prover Preview can achieve strong performance in generating Lean 4 proofs with minimal sampling. Our approach enables the model to emulate human-like problem-solving strategies by iteratively refining proofs based on real-time environment feedback. On the miniF2F-test benchmark, StepFun-Prover-Preview-32B achieves a pass@1 success rate of 70%. Please refer to our technical report for more details.

Figure 1: Performance comparison on MiniF2F-test. y-axis shows the pass@1, which is computed by averaging 32 trials; while x-axis denotes the maximum number the provers are allowed to interact with Lean4-REPL before getting successful proof. Note both DeepSeek-Prover and Kimina-Prover utilize at least 32K token context length. Stepfun-Prover was evaluated using 20K context window including feedback from Lean4-REPL.

Methodology

Figure 2: Our training pipeline. Left. Tool-integrated RL and iterative RL-SFT cycle. The upper left illustrates data preparation. Right. Tool-integrated reasoning pattern.

Experimental Results

Tabel 1: Performance comparison of theorem proving models on the miniF2F-test dataset under minimal sampling budgets. For StepFun-Provers, we generate 32 responses per query to estimate Pass@1.

Table 2: Performance of StepFun-Prover-Preview-32B on MiniF2F-test with various maximum generation lengths. Longer response can get better results.

Model Download

License

Both the code repository and the model weights are released under the Apache License (Version 2.0).

Citation

@misc{stepfunprover2025,
title={StepFun-Prover Preview: Let's Think and Verify Step by Step},
author={Shang, Shijie and Wan, Ruosi and Peng, Yue and Wu, Yutong and Chen, Xiong-hui and Yan, Jie and Zhang, Xiangyu},
year={2025},
eprint={2507.20199},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2507.20199},
}

Notability

notability 3.0/10

Low stars, routine new repo