What does this model signal mean?

Qwen (Alibaba Cloud) published Qwen/Qwen3.5-2B. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license apache-2.0 · 2.3M HF downloads · High-download notable release from Qwen. onlylabs links this event to 1 captured evidence page and 6 related model signals.

Qwen (Alibaba Cloud) Model: Qwen/Qwen3.5-2B

Captured source

source ↗

Hugging Face/huggingface.co/Qwen/Qwen3.5-2B

Qwen/Qwen3.5-2B model card

Source ↗

published Feb 28, 2026seen Jun 6captured Jun 11http 200method plaintask image-text-to-textlicense apache-2.0library transformersparams 2.3Bdownloads 2326klikes 338

Qwen3.5-2B

> [!Note] > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. > > In light of its parameter scale, the intended use cases are prototyping, task-specific fine-tuning, and other research or development purposes.

Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility to empower developers and enterprises with unprecedented capability and efficiency.

Qwen3.5 Highlights

Qwen3.5 features the following enhancement:

Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.

Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.

Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.

Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.

Next-Generation Training Infrastructure: Near-100% multimodal training efficiency compared to text-only training and asynchronous RL frameworks supporting massive-scale agent scaffolds and environment orchestration.

For more details, please refer to our blog post Qwen3.5.

Model Overview

Type: Causal Language Model with Vision Encoder
Training Stage: Pre-training & Post-training
Language Model
Number of Parameters: 2B
Hidden Dimension: 2048
Token Embedding: 248320 (Padded)
Number of Layers: 24
Hidden Layout: 6 × (3 × (Gated DeltaNet → FFN) → 1 × (Gated Attention → FFN))
Gated DeltaNet:
Number of Linear Attention Heads: 16 for V and 16 for QK
Head Dimension: 128
Gated Attention:
Number of Attention Heads: 8 for Q and 2 for KV
Head Dimension: 256
Rotary Position Embedding Dimension: 64
Feed Forward Network:
Intermediate Dimension: 6144
LM Output: 248320 (Tied to token embedding)
MTP: trained with multi-steps
Context Length: 262,144 natively

Benchmark Results

Language

Qwen3-4B-2507Qwen3-1.7BQwen3.5-2BQwen3.5-0.8B

Instruct (Non-Thinking) Mode

MMLU-Pro 69.6 40.2 55.3 29.7

MMLU-Redux 84.2 64.4 69.2 48.5

C-Eval 80.2 61.0 65.2 46.4

SuperGPQA 42.8 21.0 30.4 16.9

IFEval 83.4 68.2 61.2 52.1

MMMLU 64.9 46.7 56.9 34.1

Knowledge & STEM (Thinking)

MMLU-Pro 74.0 56.5 66.5 42.3

MMLU-Redux 86.1 73.9 79.6 59.5

C-Eval 82.2 68.1 73.2 50.5

SuperGPQA 47.8 31.2 37.5 21.3

GPQA 65.8 40.1 51.6 11.9

Instruction Following (Thinking）

IFEval 87.4 72.5 78.6 44.0

IFBench 50.4 26.7 41.3 21.0

MultiChallenge 41.7 27.2 33.7 18.9

Long Context (Thinking）

AA-LCR 32.0 6.7 25.6 4.7

LongBench v2 42.8 26.5 38.7 26.1

Reasoning (Thinking）

HMMT Feb 25 57.5 10.2 22.9 --

HMMT Nov 25 69.6 8.9 19.6 --

General Agent (Thinking）

BFCL-V4 39.9 -- 43.6 25.3

TAU2-Bench 43.2 -- 48.8 11.6

Multilingualism (Thinking）

MMMLU 70.8 57.0 63.1 44.3

MMLU-ProX 62.4 49.4 52.3 34.6

NOVA-63 47.1 40.3 46.4 42.4

INCLUDE 64.4 51.8 55.4 40.6

Global PIQA 73.5 63.1 69.3 59.4

PolyMATH 46.2 25.2 26.1 8.2

WMT24++ 58.9 39.3 45.8</td

Notability

notability 8.0/10

High-download notable release from Qwen