What does this repo signal mean?

Moonshot AI (Kimi) published MoonshotAI/CombiBench (Lean). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo MoonshotAI/CombiBench · language Lean · New benchmark repo with low traction.. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Moonshot AI (Kimi) Repo: MoonshotAI/CombiBench

Captured source

source ↗

GitHub/github.com/MoonshotAI/CombiBench

MoonshotAI/CombiBench repository metadata

Source ↗

published Apr 28, 2025seen Jun 5captured Jun 11http 200method plain

MoonshotAI/CombiBench

Language: Lean

License: MIT

Stars: 50

Forks: 6

Open issues: 0

Created: 2025-04-28T10:57:32Z

Pushed: 2025-12-16T11:53:09Z

Default branch: master

Fork: no

Archived: no

README:

CombiBench

CombiBench is the first benchmark focused on combinatorial problems, based on the formal language Lean 4. CombiBench is a manually produced benchmark, including 100 combinatorial mathematics problems of varying difficulty and knowledge levels. It aims to provide a benchmark for evaluating the combinatorial mathematics capabilities of automated theorem proving systems to advance the field. For problems that require providing a solution first and then proving its correctness, we have referred to the style of PutnamBench.

We are hosting a **leaderboard** and will readily receive evaluation results which are accompanied by a preprint or publication. Please reach out privately at liujunqi@amss.ac.cn with any requests for additions to the leaderboard.

Statistics

We collected all combinatorics problems from the official IMO problems since 2000, except for one problem that relies on a figure. And We selected problems through random sampling from 14 chapters in the book, choosing 3 problems from each chapter, ensuring that the 42 problems are evenly distributed across all 14 chapters. We randomly selected 10 simple combinatorics problems at the middle school level from a mathematics problem collection website hackmath. Then, we randomly collected 12 problems from other mathematics competitions.

| Source | Count | | ---------------- | -------------- | | Hackmath | 10 | | Brualdi's book | 42 | | IMO | 36 | | APMO | 2 | | Balticway | 1 | | EGMO | 1 | | IMO-Shortlist | 4 | | IZHO | 2 | | BXMO | 1 | | USAMO | 1 |

Note : The complete proofs of Problem 3 and Problem 5 from IMO 2024 have already been formalized in mathlib4/Archive/Imo2024Q3 and mathlib4/Archive/Imo2024Q5. Therefore, we directly refer to the statements of these problems, along with the necessary definitions used in the statements. We are very grateful to Joseph Myers, the author of these two problems. We also appreciate his suggestions on the formalization of our problems.

Requirements

This project requires python >= 3.10, Lean>=4.15.0.

Install the Python dependencies:

pip install -e .

Usage

Setup a Lean Server

Follow https://github.com/project-numina/kimina-lean-server to configure a lean server and get a custom url and API_KEY.

Setup a LLM API Key

We support API interfaces such as OpenAI, Antropic, TogetherAI, and Google GenerativeAI.

Configuration

Refer to evaluation/config/template.json5 to configure the dataset, lean server, llm server, generation parameters, prompt, and parallel parameters.

Run Evaluation

To run one-stage Fine-Eval:

python evaluation/cli.py online-one-stage -c evaluation/config/template.json5

To run two-stage Fine-Eval:

python evaluation/cli.py online-two-stage -c evaluation/config/template.json5

Note that both evaluation methods are compatible with theorem proving tasks and fill-in-the-blank tasks.

If you use uv, you can run the following command to quick evaluate with default settings:

uv run evaluation/cli.py online-two-stage -c evaluation/config/template.json5

More command line options can be found by running python evaluation/cli.py --help.

🙌 Contributing

Contributions are welcome! If anyone notices any mistakes, please raise an issue on the repository and we will address it.

📝 License

This project is licensed under the MIT License. See the [LICENSE](./LICENSE) file for full details.

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

New benchmark repo with low traction.