stepfun-ai/Qwen2.5-32B-DialogueReason
Captured source
source ↗published May 9, 2025seen 5dcaptured 12hhttp 200method plainlicense apache-2.0params 33Bdownloads 9likes 12
Introduction
Qwen2.5-32B-DialogueReason is a dialogue-based reasoning model built on Qwen2.5-32B-Base. We train the model using Open-Reasoner-Zero data through rule-based reinforcement learning.
🧠 Key Features
- Qwen2.5-32B-Base as the foundation.
- Use Rule-Based RL to achieve dialogue reasoning.
- With dynamic agent initialization to adapt to various scenarios.
- With flexible environment configuration to set up task-specific contexts.
- With multi-turn dialogue reasoning to incrementally solve problems.
Example
System:
> The User asks a question, and the Assistant writes a masterpiece play depicting experts (picked based on the topic with concrete names) solving the question in a ultra-detailed dialogue. The response is formatted as: the play goes here\\n if asked to write code, then code here surrounded by ```. Otherwise, answer here with \\boxed{answer} emphasized.
User:
> Give me a detailed explanation of PPO in RL
Assistant:
> !image/png
Notability
notability 4.0/10Low downloads; routine fine-tune release.