RepoInclusionAI (Ant Group)InclusionAI (Ant Group)published Mar 4, 2025seen 5d

inclusionAI/PromptCoT

Python

Open original ↗

Captured source

source ↗
published Mar 4, 2025seen 5dcaptured 11hhttp 200method plain

inclusionAI/PromptCoT

Description: A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architectures

Language: Python

License: MIT

Stars: 132

Forks: 15

Open issues: 4

Created: 2025-03-04T07:02:25Z

Pushed: 2026-01-31T06:55:27Z

Default branch: main

Fork: no

Archived: no

README: PromptCoT 2.0

Scaling Prompt Synthesis for LLM Reasoning

📄 Paper • 🤗 Hugging Face

---

✨ Overview

PromptCoT 2.0 is a principled and scalable framework for prompt synthesis that substantially advances LLM reasoning in both mathematics and programming.

It introduces an EM-style rationale-driven synthesis loop (*concept → rationale → problem*), enabling the automatic generation of diverse and challenging problems at scale. These synthetic prompts support two complementary training regimes:

Self-Play: the model improves autonomously by learning from verifiable signals (e.g., unit tests for code, boxed answers for math). With this approach, a 30B-A3B self-play model achieves 92.1 on AIME24, 89.8 on AIME25, and 76.7 on HMMT Feb25, as well as 74.2 on LiveCodeBench v5, 71.0 on v6, and 2079 Elo on Codeforces. These results surpass strong open-source baselines (Qwen3-30B-A3B-Thinking) and achieve competitive performance with closed-source leaders such as Gemini 2.5 Pro and OpenAI o3 across math and code.

SFT: a 7B model trained 100% on synthetic data—using prompts synthesized by PromptCoT 2.0 and complete reasoning trajectories distilled from GPT-OSS-120B (medium)—reaches 73.1 on AIME24, 65.6 on AIME25, and 1815 Elo on Codeforces, outperforming counterparts trained on human-written prompts.

Unleash the PromptCoT tide of reasoning!

---

⚡ Main Results

Self-Play @ Qwen3-30B-A3B-2507-Thinking:

PromptCoT 2.0 demonstrates that large-scale self-play with verifiable signals is effective for advancing LLM reasoning. At 30B scale, self-play achieves performance competitive with closed-source leaders (Gemini 2.5 Pro, OpenAI o3) and surpasses strong open-source baselines.

SFT @ Qwen2.5-7B-Instruct:

PromptCoT 2.0 (7B, SFT) is the first model trained entirely on synthetic prompts with trajectories distilled from GPT-OSS-120B. Unlike OpenCodeReasoning and OpenMathReasoning — both built on human-written prompts — PromptCoT 2.0 achieves stronger performance, highlighting the potential of fully synthetic prompt synthesis as a foundation for reasoning models.

---

🔮 Releases

[2025/10/26] We release the problem generation recipe (problem_generation.sh), enabling full reproduction of PromptCoT 2.0's scalable synthesis pipeline from concept files.

[2025/09/24] We release PromptCoT 2.0: the first framework to scale prompt synthesis across both math and programming, enabling 30B self-play competitive with Gemini 2.5 Pro / OpenAI o3, and 7B SFT (100% synthetic prompts) surpassing human-written baselines.

📂 Resources

[2025/05/30] We release PromptCoT-Mamba (🤗 PromptCoT-Mamba-7B): the first attention-free reasoning model, combining PromptCoT with Mamba-2 to achieve strong math & code performance with constant-memory inference.

[2025/04/11] We release PromptCoT-QwQ-32B and PromptCoT-QwQ-Dataset: self-play of QwQ-32B using PromptCoT synthetic problems, with dedicated datasets for reproducible training.

[2025/03/07] We release PromptCoT 1.0 (🤗 HF Collection): the first rationale-driven synthesis pipeline for Olympiad-level math problems, releasing problem generation models, distilled models, and datasets.

---

Quick Start

git clone https://github.com/inclusionAI/PromptCoT
cd PromptCoT
pip install -r requirements.txt

---

Configuration

Top-level scripts support loading default configuration values from a local .env file.

1. Copy .env.example to .env 2. Edit values (for example MODEL_PATH, N_GPUS, DATA_PATH, OUTPUT_PATH) 3. Validate your setup:

python validate_config.py

Notes:

  • Precedence is CLI args > .env > code defaults.
  • MODEL_PATH / TOKENIZER_PATH can be a local path or a Hugging Face model id; the validator only checks filesystem paths.
  • Empty strings in .env are treated as "unset" (e.g. DATA_PATH= behaves like not set).
  • Prefer namespaced environment variables (e.g. SPLIT_MERGE_OUTPUT_PATH, SELF_PLAY_OUTPUT_PATH) to avoid collisions when you run multiple scripts from the same .env.
  • Some scripts historically used different env var names (e.g. infer_split_merge.py uses N_SPLITS, while infer_self_play.py uses NUM_SPLITS); .env.example documents the mapping and the code includes small fallbacks for these.

To run the lightweight unit tests in this repo:

python -m unittest discover -s tests -v

---

🧩 Problem Generation (Concept → Rationale → Problem)

We provide a script to synthesize problems from concept files using the PromptCoT 2.0 pipeline.

  • Concept files: available at [xl-zhao/PromptCoT-2.0-Concepts](https://huggingface.co/datasets/xl-zhao/PromptCoT-2.0-Concepts) (e.g., PromptCoT-2.0-Concepts/code.jsonl).
  • Model: set --model_path in the script to your PromptCoT-2.0-Prompt-Generation-Model (see Releases for links).

**Make the…

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

New repo with moderate stars