What does this model signal mean?

Meituan (LongCat) published meituan-longcat/LongCat-Flash-Thinking. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license mit · 154 HF downloads · Fast-thinking LLM by Meituan optimized for quick reasoning tasks.. onlylabs links this event to 1 captured evidence page and 6 related model signals.

Meituan (LongCat) Model: meituan-longcat/LongCat-Flash-Thinking

Captured source

source ↗

Hugging Face/huggingface.co/meituan-longcat/LongCat-Flash-Thinking

meituan-longcat/LongCat-Flash-Thinking model card

Source ↗

published Sep 21, 2025seen Jun 6captured Jun 11http 200method plaintask text-generationlicense mitlibrary LongCat-Flash-Chatparams 562Bdownloads 154likes 148

LongCat-Flash-Thinking

Tech Report 📄

Model Introduction

We introduce and release LongCat-Flash-Thinking, which is a powerful and efficient large reasoning model (LRM) with 560 billion total parameters, featuring an innovative Mixture-of-Experts (MoE) architecture. The model incorporates a dynamic computation mechanism that activates 18.6B∼31.3B parameters (averaging∼27B) based on contextual demands, optimizing both computational efficiency and performance. LongCat-Flash-Thinking is developed by our DORA system, which is an efficient distributed RL framework that supports asynchronous training and flexible accelerator usage to ensure stability and efficiency. Our comprehensive data curation and domain-parallel training recipe ensures stable and efficient training. In addition to general reasoning, the model is also equipped with techniques of formal reasoning and agentic reasoning, advancing the LRMs' reasoning ability on diverse complex tasks such as mathematics, logic, programming, automatic theorem proving, and tool use.

Specifically, the development of LongCat-Flash-Thinking follows a two-phase pipeline:

Long CoT Cold-Start Training: This phase aims to cultivate the model's foundational reasoning abilities.

This begins with a curriculum learning strategy during mid-training to bolster intrinsic capabilities, followed by a SFT stage on reasoning-intensive and agentic data to prepare the model for advanced learning.

Large-Scale RL: The second phase scales up this potential through an efficient RL framework, built upon our Dynamic Orchestration for Asynchronous Rollout (DORA) system for industrial-scale asynchronous training.

To address the stability challenges in asynchronous RL training, we adapt and extend the GRPO algorithm for a robust exploration-exploitation balance. A key innovation in this phase is our domain-parallel training scheme, which simultaneously optimizes the model across distinct domains and subsequently merges the resulting domain-expert models into a fused model. Finally, we perform a general RL stage to further refine the fused model and enhance its robustness, safety, and human alignment ability.

Key Features

🌟 Domain-Parallel RL Training Methodology

To overcome the instability of traditional mixed-domain RL training, LongCat-Flash-Thinking incorporates a domain-parallel training scheme that decouples optimization across STEM, coding, and agentic tasks. This approach not only stabilizes training, but also allows to fuse the resulting domain-expert models into a nearly Pareto-optimal final model that excels across all specialties.

🌟 Pioneering RL Infrastructure

LongCat-Flash-Thinking is built upon our self-designed DORA system. The main motivation is to optimize long-tail generation by leveraging multiple old versions of the Actor model through streaming rollout while keeping sampling consistency. DORA system consists of two core components, such as elastic colocation and multi-version asynchronous pipeline. These components aim to enhance training efficiency, ensure policy consistency per sample, and further enable efficient KV-cache reuse, facilitating stable and scalable training on tens of thousands of accelerators.

🌟 Advancing Formal Reasoning and Agentic Reasoning

In addition to general reasoning (e.g., mathematics, logic, coding, instruction-following, etc.), LongCat-Flash-Thinking also emphasizes two other critical capabilities.

Formal Reasoning: LongCat-Flash-Thinking can solve complex formal reasoning tasks, e.g., automatic theorem proving. To help realize this potential and empower researchers, we introduce significant enhancements to our model's formal reasoning capabilities.

To achieve this, we introduce a novel expert iteration framework for careful data synthesis, involving statement formalization, iterative proof synthesis, and syntax/consistency filtering.

Agentic Reasoning: LongCat-Flash-Thinking can adaptively utilize provided tools to solve complex reasoning tasks. To reach this goal, we introduce a dual-path reasoning approach to identify and retain high-quality queries that genuinely require tool assistance, thereby fostering the development of robust agentic abilities.

After high-value query selection, we synthesize corresponding high-quality solution trajectories based on a versatile environment with diverse tool APIs, including MCP servers and simulated tools for both single and multi-turn interactions.

For more details, please refer to the comprehensive **LongCat-Flash-Thinking Technical Report**.

Evaluation Results

| Benchmark | DeepSeek-V3.1-Thinking | Qwen3-235B-A22B-Thinking-2507 | GLM-4.5 | OpenAI-o3 | Gemini2.5-Pro | GPT-5-Thinking | LongCat-Flash-Thinking | |---------------|-------------------------|------------------------------|--------|-----------|---------------|----------------|-------------------------| | Architecture | MoE | MoE | MoE | - | - | - | MoE | | \# Total Params | 671B | 235B | 355B | - | - | - | 560B | | \# Activated Params | 37B | 22B | 32B | - | - | - | 27B | | General QA | | | | | | | | | MMLU-Pro(acc) | 84.4 | 84.4 | 81.5 | 85.3 | 86.7 | 84.5 | 82.6 | | MMLU-Redux(acc) | 90.5 | 91.4 | 89.9 | 93.1 | 90.1 | 92.6 | 89.3 | | Alignment | | | | | | | | | IFEval(strict prompt) | 86.3 | 89.3 | 85.4 | 90.2 | 92.4 | 92.8 | 86.9 | | Arena-Hard(hard prompt gemini) | 57.1 | 74.5 | 67.7 | 87.1 | 87.1 | 87.7 | 69.9 | | Mathematical Reasoning | | | | | | | | | MATH500(Mean@1) | 98.8 | 99.6 | 95.4 | 98.4 | 98.0 | 99.2 | 99.2 | | HMMT25(Mean@32) | 80.4 | 83.8 | 76.3 | 71.9 | 79.3 | 84.8 | 83.7 | | AIME24(Mean@32) | 93.9 | 93.9 | 89.3 | 91.6* | 90.7 | 92.0 | 93.3 | | AIME25(Mean@32) | 87.9 | 92.5 | 85.5 | 88.9* | 89.2 | 94.6* | 90.6 | | BeyondAIME(Mean@10) | 71.8 | 71.5 | 66.0 | 63.2 | 63.0 | 70.0 | 69.5 | | General Reasoning | | | | | | | | | GPQA-Diamond(Mean@16) | 84.2 | 80.4 | 78.3 | 81.9 | 84.0 | 84.4 | 81.5 | | ZebraLogic(Mean@1) | 96.1 | 97.5 | 90.9 | 94.3 | 92.4 | 92.7 | 95.5 | | Sudoku-Bench(Mean@1) | 1.0 | 2.0 | 1.0 | 70.0 | 0.0 | 63.0 | 56.0 | | ARC-AGI(Mean@1) | 37.5 | 45.3 | 21.41 | 47.3 | 46.8 | 59.0 | 50.3 | | Coding | | | | | | | | | LiveCodeBench(Mean@4) | 73.5 | 75.4 | 61.1 | 76.2 | 74.2 | 80.6 | 79.4 | | OJBench(Mean@1) | 33.6 | 32.1 | 19.0 | 38.4 | 41.6 | 34.1 | 40.7 | | Agentic Tool Using | | | | | | | | | SWE-Bench(Pass@1) | 66.0* | 34.4 | 64.2* | 69.1* | 59.6*...

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Low traction, minor release