What does this repo signal mean?

Qwen (Alibaba Cloud) published QwenLM/ProcessBench (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo QwenLM/ProcessBench · language Python · New benchmark repo from Qwen, moderate traction.. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Qwen (Alibaba Cloud) Repo: QwenLM/ProcessBench

Captured source

source ↗

GitHub/github.com/QwenLM/ProcessBench

QwenLM/ProcessBench repository metadata

Source ↗

published Dec 9, 2024seen 6dcaptured 13hhttp 200method plain

QwenLM/ProcessBench

Description: Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"

Language: Python

Stars: 191

Forks: 18

Open issues: 7

Created: 2024-12-09T06:21:41Z

Pushed: 2025-05-20T06:43:45Z

Default branch: main

Fork: no

Archived: no

README:

ProcessBench

📄 [[paper]](https://huggingface.co/papers/2412.06559) 🤗 [[data]](https://huggingface.co/datasets/Qwen/ProcessBench)

This is the official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"

If you find this work relevant or helpful to your work, please kindly cite us:

@inproceedings{
processbench,
title={ProcessBench: Identifying Process Errors in Mathematical Reasoning},
author={
Chujie Zheng and Zhenru Zhang and Beichen Zhang and Runji Lin and Keming Lu and
Bowen Yu and Dayiheng Liu and Jingren Zhou and Junyang Lin
},
booktitle={The 63rd Annual Meeting of the Association for Computational Linguistics
},
year={2025}
}

Data Usage

You can use the following code to preview the ProcessBench data:

import json
from datasets import load_dataset

dataset = load_dataset('Qwen/ProcessBench', split='gsm8k')
print(json.dumps(dataset[0], indent=2))

# Expected output:
"""
{
"id": "gsm8k-0",
"generator": "Qwen2-7B-Instruct",
"problem": "Sue lives in a fun neighborhood...",
"steps": [
"To find out how many more pink plastic flamingos were out than...",
...
],
"final_answer_correct": false,
"label": 1
}
"""

Evaluation

You can refer to the [code](./code) folder for the evaluation code and the prompt templates we use in this work

Notability

notability 5.0/10

New benchmark repo from Qwen, moderate traction.