What does this repo signal mean?

InclusionAI (Ant Group) published inclusionAI/M2-Reasoning (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo inclusionAI/M2-Reasoning · language Python · New reasoning repo, low stars. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

InclusionAI (Ant Group) Repo: inclusionAI/M2-Reasoning

Captured source

source ↗

GitHub/github.com/inclusionAI/M2-Reasoning

inclusionAI/M2-Reasoning repository metadata

Source ↗

published Jul 2, 2025seen 5dcaptured 10hhttp 200method plain

inclusionAI/M2-Reasoning

Description: M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning

Language: Python

Stars: 48

Forks: 0

Open issues: 5

Created: 2025-07-02T12:48:46Z

Pushed: 2025-07-17T07:59:39Z

Default branch: main

Fork: no

Archived: no

README:

M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning

📖 [Technical Report](./assets/M2-Reasoning.pdf) | 📄 arXiv | 🤗 Hugging Face｜ 🤖 ModelScope

Introduction

We introduce M2-Reasoning-7B, a model designed to excel in both general and spatial reasoning. Our approach integrates two key innovations: (1) a novel data pipeline that generates 294.2K high-quality data samples (168K for cold-start fine-tuning and 126.2K for RLVR), which feature logically coherent reasoning trajectories and have undergone comprehensive assessment; and (2) a dynamic multi-task training strategy with step-wise optimization to mitigate conflicts between data, and task-specific rewards for delivering tailored incentive signals. This combination of curated data and advanced training allows M2-Reasoning-7B to set a new state-of-the-art (SOTA) across 8 benchmarks, showcasing superior performance in both general and spatial reasoning domains. ![](assets/teaser.png)

📌 Updates

[2025.07.14] 🔥 Our Technical Report is available on 📄 arXiv.
[2025.07.11] 🔥 We release M2-Reasoning on 🤗 Hugging Face and 🤖 ModelScope.

Key Features

A High-quality Data Construction Pipeline: We design and implement a multi-stage data synthesis and curation pipeline that generates vast amounts of reasoning data.
A Dynamic Multi-Task Training Strategy: We propose a sophisticated training strategy that effectively handles data heterogeneity. It features step-wise dynamic optimization to mitigate conflicts between different data sources and a task-specific reward formulation to provide tailored incentive signals.
Unified General and Spatial Reasoning Model: We propose M2-Reasoning-7B, an MLLM uniquely engineered for both abstract and spatial reasoning. Extensive evaluations on 8 distinctbenchmarks demonstrate that, by leveraging our custom data and training pipelines, M2-Reasoning establishes new state-of-the-art (SOTA) results across both general and spatial reasoning domains.

Evaluation

We conduct a comprehensive evaluation of our models across two key domains: general and spatial reasoning. Our evaluation utilizes a diverse set of public benchmarks, grouped by the primary capability they measure:

General Reasoning (Mathematical & Logical): To evaluate this capability, we employ six benchmarks: MathVista, MathVision, MathVerse, DynaMath, WeMath, and LogicVista.

|Models| MathVista| MathVision| MathVerse| DynaMath| WeMath| LogicVista| Avg. (Δ)| |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:| |*Base-Scale General Models*| |InternVL3-8B | 70.5| 30.0| 38.5| 25.7 |39.5 |44.5 |41.4| |InternVL3-9B | 69.0 | 29.3| 37.9 |25.1 |34.8| 49.0 |40.8| |Qwen2.5-VL-7B |68.1 |25.4 |41.1 |21.8 |36.2| 47.9| 40.1| |MUG-U-7B | 74.8 |26.1 |35.4 |17.2 |26.5 |39.8| 36.6| |SAIL-VL-1.6-8B | 74.2 |23.2| 33.4 |14.0 |29.6 |41.4| 36.0| |*Base-Scale Reasoning Models*| |WeThink-VL-7B| 71.6 |26.0| 44.2 |24.8 |48.0 |51.2| 44.3 (+4.2)| |Taichu-VLR-7B | 72.3| 27.1 |46.7 |23.0 |44.0 |48.3 |43.6| |VLAA-Thinker-7B | 68.0 |26.4| 48.2 |22.4 |41.5 |48.5 |42.5 (+2.4)| |URSA-8B-PS-GRPO | 67.8 |31.8 |41.5 |22.4| 38.3 |44.7 |41.1 (+8.2)| |Ovis2-8B |71.8 |25.9| 42.3 |20.4 |27.2 |39.4| 37.8| |*Our Models*| |Base Model |70.2| 25.9| 30.5| 20.2| 27.2| 37.8| 35.5| |M2-Reasoning-CI-7B| 71.7| 29.2| 42.1| 25.0 |42.8| 46.8 |42.9 (+7.4)| |M2-Reasoning-7B | 75.0 |31.5| 44.7 |26.8 |41.8 |50.0 |45.0 (+9.5)| |M2-Reasoning-7B-HF* | 74.7 |30.5| 46.1 |26.8 |42.7 |49.2 |45.0 (+9.5)|

\* After converting the checkpoints to huggingface, the accuracies are slightly different.

Spatial Reasoning: We assess this skill using 2 benchmarks: CV-Bench and VSI-Bench
CV-Bench:

| Models | Count | Relation | Depth | Distance | Avg. | | :--- | :---: | :---: | :---: | :---: | :---: | | *Large-Scale Models* | | | | | | | GPT-4O | 65.9 | 85.7 | 87.8 | 78.2 | 78.9 | | Gemini-1.5-pro | 70.4 | 85.2 | 82.4 | 72.8 | 77.4 | | *Base-Scale Models* | | | | | | | InternVL3-8B| 74.0 | 90.6 | 84.3 | 81.0 | 82.0 | | Qwen2.5-VL-7B-Instruct | 65.2 | 86.6 | 70.6 | 79.8 | 75.0 | | LLava-NEXT-Video-7B | 59.3 | 77.0 | 71.3 | 54.7 | 65.2 | | *Our Models* | | | | | | | M2-Reasoning-7B | 66.6 | 92.8 | 89.3 | 84.3 | 82.3 |

VSI-Bench:

| | OC | AD| OS|RS |RDs |RDr |RP |AO |Avg. | | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | *Large-Scale Models* | | | | | | | | | | | Gemini-1.5-pro | 56.2 | 30.9 | 64.1 | 43.6 | 51.3 | 46.3 | 36.0 | 34.6 | 45.4 | | GPT-4O | 46.2 | 5.3 | 43.8 | 38.2 | 37.0 | 41.3 | 31.5 | 28.5 | 34.0 | | *Base-Scale Models* | | | | | | | | | | | InternVL3-8B | 68.1 | 39.0 | 48.4 | 33.6 | 48.3 | 36.4 | 27.3 | 35.4 | 42.1 | | Video-R1-7B | - | - | - | - | - | - | - | - | 37.1 | | Qwen2.5-VL-7B-Instruct| 37.7 | 20.1 | 49.7 | 37.4 | 38.5 | 40.4 | 31.4 | 32.0 | 35.9 | | LLava-NeXT-Video-7B| 48.5 | 14.0 | 47.8 | 24.2 | 43.5 | 42.4 | 34.0 | 30.6 | 35.6 | | *Our Models* | | | | | | | | | | | M2-Reasoning-7B | 41.0 | 34.0 | 60.9 | 55.4 | 40.7 | 47.3 | 29.9 | 28.8 | 42.3 |

Model Downloads

You can download the model from both 🤗 Hugging Face and 🤖 ModelScope.

Installation

Please download our model following Model Downloads, then you can refer to the following codes to run M2-Reasoning model. The basic environment is python=3.10, torch=2.6.0+cu124, transformers=4.49.0

Example Usage

We provide a small example on the usage of this repo. For detailed usage.

import os
import torch

from transformers import (
AutoProcessor,
AutoTokenizer,
)

import warnings
import argparse
from modeling_bailing_qwen2_5 import Bailing_qwen2_5NativeForConditionalGeneration
from processing_bailing_qwen2_5 import…

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

New reasoning repo, low stars