RepoQwen (Alibaba Cloud)Qwen (Alibaba Cloud)published Aug 8, 2024seen 6d

QwenLM/Qwen2.5-Math

Python

Open original ↗

Captured source

source ↗
published Aug 8, 2024seen 6dcaptured 8hhttp 200method plain

QwenLM/Qwen2.5-Math

Description: A series of math-specific large language models of our Qwen2 series.

Language: Python

Stars: 1078

Forks: 160

Open issues: 43

Created: 2024-08-08T08:55:39Z

Pushed: 2025-01-11T17:30:43Z

Default branch: main

Fork: no

Archived: no

README:

💜 Qwen Chat&nbsp&nbsp | &nbsp&nbsp🤗 Hugging Face&nbsp&nbsp | &nbsp&nbsp🤖 ModelScope&nbsp&nbsp | &nbsp&nbspKaggle&nbsp&nbsp | &nbsp&nbsp 📑 Blog &nbsp&nbsp | &nbsp&nbsp📖 Documentation

WeChat (微信)&nbsp&nbsp | &nbsp&nbsp🫨 Discord&nbsp&nbsp

Visit our Hugging Face or ModelScope organization (click the links above). Search checkpoints with names starting with Qwen2.5-Math-, and you will find all you need! Enjoy!

Introduction

A month ago, we released the first series of mathematical LLMs - Qwen2-Math - of our Qwen family. Today, we have upgraded it and open-sourced Qwen2.5-Math series, including base models Qwen2.5-Math-1.5B/7B/72B, instruction-tuned models Qwen2.5-Math-1.5B/7B/72B-Instruct, and mathematical reward model Qwen2.5-Math-RM-72B.

Unlike Qwen2-Math series which only supports using Chain-of-Thught (CoT) to solve English math problems, Qwen2.5-Math series is expanded to support using both CoT and Tool-integrated Reasoning (TIR) to solve math problems in both Chinese and English. The Qwen2.5-Math series models have achieved significant performance improvements compared to the Qwen2-Math series models on the Chinese and English mathematics benchmarks with CoT.

Detailed performance and introduction are shown in this 📑 blog.

>

Requirements

  • transformers>=4.37.0 for Qwen2.5-Math models. The latest version is recommended.

> [!Warning] >

For requirements on GPU memory and the respective throughput, see similar results of Qwen2 here.

Quick Start

> [!Important] > > Qwen2.5-Math-72B-Instruct is an instruction model for chatting; > > Qwen2.5-Math-72B is a base model typically used for few-shot inference, serving as a better starting point for fine-tuning. >

🤗 Hugging Face Transformers

Qwen2.5-Math can be deployed and inferred in the same way as Qwen2.5. Here we show a code snippet to show you how to use the chat model with transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-Math-72B-Instruct"
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Find the value of $x$ that satisfies the equation $4x+5 = 6x+7$."

# CoT
messages = [
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
{"role": "user", "content": prompt}
]

# TIR
messages = [
{"role": "system", "content": "Please integrate natural language reasoning with programs to solve the problem above, and put your final answer within \\boxed{}."},
{"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

This time, we also released a mathematical reward model, Qwen2.5-Math-RM-72B, based on Qwen2.5-Math-72B-Instruct. Qwen2.5-Math-RM-72B can be easily infered with HuggingFace Transformers with trust_remote_code.

> [!Warning] > > We use temperature=0.7 and top_p=0.8 for maj@8 and RM@8 sampling with Qwen2.5-Math-Instruct, while temperature=1.0 and top_p=0.9 with Qwen2-Math-Instruct. >

import torch
from transformers import AutoModel, AutoTokenizer

model_name = "Qwen/Qwen2.5-Math-RM-72B"
device = "auto" # the device to load the model onto

model = AutoModel.from_pretrained(
model_name,
device_map=device,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).eval()

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

chat = [
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
{"role": "user", "content": "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?"},
{"role": "assistant", "content": "To determine how much Janet makes from selling the duck eggs at the farmers' market, we need to follow these steps:\n\n1. Calculate the total number of eggs laid by the ducks each day.\n2. Determine how many eggs Janet eats and bakes for herself each day.\n3. Find out how many eggs are left to be sold.\n4. Calculate the revenue from selling the remaining eggs at $2 per egg.\n\nLet's start with the first step:\n\n1. Janet's ducks lay 16 eggs per day.\n\nNext, we calculate how many eggs Janet eats and bakes for herself each day:\n\n2. Janet eats 3 eggs for breakfast every morning.\n3. Janet bakes 4 eggs for her friends every day.\n\nSo, the total number of eggs Janet eats and bakes for herself each day is:\n\\[ 3 + 4 = 7 \\text{ eggs} \\]\n\nNow, we find out how many eggs are left to be sold:\n\\[ 16 - 7 = 9 \\text{ eggs} \\]\n\nFinally, we calculate the revenue from selling the remaining eggs at $2 per egg:\n\\[ 9 \\times 2 = 18 \\text{ dollars} \\]\n\nTherefore, Janet makes \\boxed{18} dollars every day at the farmers' market."}
] # 3.75

conversation_str = tokenizer.apply_chat_template(
chat,
tokenize=False,
add_generation_prompt=False
)

input_ids = tokenizer.encode(
conversation_str,
return_tensors="pt",
add_special_tokens=False
).to(model.device)

outputs = model(input_ids=input_ids)
print(outputs[0])

🤖 ModelScope

We strongly advise users, especially those in mainland China, to use ModelScope. snapshot_download can help you solve issues concerning downloading checkpoints.

Local Demo (Qwen-Agent)

We developed a demo that supports the TIR mode in…

Excerpt shown — open the source for the full document.

Notability

notability 7.0/10

Solid stars for math model release