inclusionAI/PromptCoT
Python
Captured source
source ↗inclusionAI/PromptCoT
Description: A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architectures
Language: Python
License: MIT
Stars: 132
Forks: 15
Open issues: 4
Created: 2025-03-04T07:02:25Z
Pushed: 2026-01-31T06:55:27Z
Default branch: main
Fork: no
Archived: no
README: PromptCoT 2.0
Scaling Prompt Synthesis for LLM Reasoning
📄 Paper • 🤗 Hugging Face
---
✨ Overview
PromptCoT 2.0 is a principled and scalable framework for prompt synthesis that substantially advances LLM reasoning in both mathematics and programming.
It introduces an EM-style rationale-driven synthesis loop (*concept → rationale → problem*), enabling the automatic generation of diverse and challenging problems at scale. These synthetic prompts support two complementary training regimes:
Self-Play: the model improves autonomously by learning from verifiable signals (e.g., unit tests for code, boxed answers for math). With this approach, a 30B-A3B self-play model achieves 92.1 on AIME24, 89.8 on AIME25, and 76.7 on HMMT Feb25, as well as 74.2 on LiveCodeBench v5, 71.0 on v6, and 2079 Elo on Codeforces. These results surpass strong open-source baselines (Qwen3-30B-A3B-Thinking) and achieve competitive performance with closed-source leaders such as Gemini 2.5 Pro and OpenAI o3 across math and code.
SFT: a 7B model trained 100% on synthetic data—using prompts synthesized by PromptCoT 2.0 and complete reasoning trajectories distilled from GPT-OSS-120B (medium)—reaches 73.1 on AIME24, 65.6 on AIME25, and 1815 Elo on Codeforces, outperforming counterparts trained on human-written prompts.
Unleash the PromptCoT tide of reasoning!
---
⚡ Main Results
Self-Play @ Qwen3-30B-A3B-2507-Thinking:
PromptCoT 2.0 demonstrates that large-scale self-play with verifiable signals is effective for advancing LLM reasoning. At 30B scale, self-play achieves performance competitive with closed-source leaders (Gemini 2.5 Pro, OpenAI o3) and surpasses strong open-source baselines.
SFT @ Qwen2.5-7B-Instruct:
PromptCoT 2.0 (7B, SFT) is the first model trained entirely on synthetic prompts with trajectories distilled from GPT-OSS-120B. Unlike OpenCodeReasoning and OpenMathReasoning — both built on human-written prompts — PromptCoT 2.0 achieves stronger performance, highlighting the potential of fully synthetic prompt synthesis as a foundation for reasoning models.
---
🔮 Releases
[2025/10/26] We release the problem generation recipe (problem_generation.sh), enabling full reproduction of PromptCoT 2.0's scalable synthesis pipeline from concept files.
[2025/09/24] We release PromptCoT 2.0: the first framework to scale prompt synthesis across both math and programming, enabling 30B self-play competitive with Gemini 2.5 Pro / OpenAI o3, and 7B SFT (100% synthetic prompts) surpassing human-written baselines.
📂 Resources
- SFT Data (4.8M fully synthetic prompts + trajectories): PromptCoT-2.0-SFT-4.8M.
- SFT Model (7B): PromptCoT-2.0-SFT-7B.
- Self-Play Data: PromptCoT-2.0-SelfPlay-30B-11K and PromptCoT-2.0-SelfPlay-4B-48K.
- Self-Play Models: PromptCoT-2.0-SelfPlay-30B-A3B and PromptCoT-2.0-SelfPlay-4B.
- Problem Generation Model: PromptCoT-2.0-Prompt-Generation-Model.
[2025/05/30] We release PromptCoT-Mamba (🤗 PromptCoT-Mamba-7B): the first attention-free reasoning model, combining PromptCoT with Mamba-2 to achieve strong math & code performance with constant-memory inference.
[2025/04/11] We release PromptCoT-QwQ-32B and PromptCoT-QwQ-Dataset: self-play of QwQ-32B using PromptCoT synthetic problems, with dedicated datasets for reproducible training.
[2025/03/07] We release PromptCoT 1.0 (🤗 HF Collection): the first rationale-driven synthesis pipeline for Olympiad-level math problems, releasing problem generation models, distilled models, and datasets.
---
Quick Start
git clone https://github.com/inclusionAI/PromptCoT cd PromptCoT pip install -r requirements.txt
---
Configuration
Top-level scripts support loading default configuration values from a local .env file.
1. Copy .env.example to .env 2. Edit values (for example MODEL_PATH, N_GPUS, DATA_PATH, OUTPUT_PATH) 3. Validate your setup:
python validate_config.py
Notes:
- Precedence is
CLI args > .env > code defaults. MODEL_PATH/TOKENIZER_PATHcan be a local path or a Hugging Face model id; the validator only checks filesystem paths.- Empty strings in
.envare treated as "unset" (e.g.DATA_PATH=behaves like not set). - Prefer namespaced environment variables (e.g.
SPLIT_MERGE_OUTPUT_PATH,SELF_PLAY_OUTPUT_PATH) to avoid collisions when you run multiple scripts from the same.env. - Some scripts historically used different env var names (e.g.
infer_split_merge.pyusesN_SPLITS, whileinfer_self_play.pyusesNUM_SPLITS);.env.exampledocuments the mapping and the code includes small fallbacks for these.
To run the lightweight unit tests in this repo:
python -m unittest discover -s tests -v
---
🧩 Problem Generation (Concept → Rationale → Problem)
We provide a script to synthesize problems from concept files using the PromptCoT 2.0 pipeline.
- Concept files: available at [xl-zhao/PromptCoT-2.0-Concepts](https://huggingface.co/datasets/xl-zhao/PromptCoT-2.0-Concepts) (e.g.,
PromptCoT-2.0-Concepts/code.jsonl). - Model: set
--model_pathin the script to your PromptCoT-2.0-Prompt-Generation-Model (see Releases for links).
**Make the…
Excerpt shown — open the source for the full document.
Notability
notability 4.0/10New repo with moderate stars