What does this model signal mean?

Qwen (Alibaba Cloud) published Qwen/Qwen3.5-35B-A3B. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license apache-2.0 · 2.2M HF downloads · Major model release with massive traction. onlylabs links this event to 1 captured evidence page and 6 related model signals.

Qwen (Alibaba Cloud) Model: Qwen/Qwen3.5-35B-A3B

Captured source

source ↗

Hugging Face/huggingface.co/Qwen/Qwen3.5-35B-A3B

Qwen/Qwen3.5-35B-A3B model card

Source ↗

published Feb 24, 2026seen Jun 6captured Jun 11http 200method plaintask image-text-to-textlicense apache-2.0library transformersparams 36Bdownloads 2211klikes 1.4k

Qwen3.5-35B-A3B

> [!Note] > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.

> [!Tip] > For users seeking managed, scalable inference without infrastructure maintenance, the official Qwen API service is provided by Alibaba Cloud Model Studio. > > In particular, Qwen3.5-Flash is the hosted version corresponding to Qwen3.5-35B-A3B with more production features, e.g., 1M context length by default and official built-in tools. > For more information, please refer to the User Guide.

Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility to empower developers and enterprises with unprecedented capability and efficiency.

Qwen3.5 Highlights

Qwen3.5 features the following enhancement:

Unified Vision-Language Foundation: Early fusion training on multimodal tokens achieves cross-generational parity with Qwen3 and outperforms Qwen3-VL models across reasoning, coding, agents, and visual understanding benchmarks.

Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.

Scalable RL Generalization: Reinforcement learning scaled across million-agent environments with progressively complex task distributions for robust real-world adaptability.

Global Linguistic Coverage: Expanded support to 201 languages and dialects, enabling inclusive, worldwide deployment with nuanced cultural and regional understanding.

Next-Generation Training Infrastructure: Near-100% multimodal training efficiency compared to text-only training and asynchronous RL frameworks supporting massive-scale agent scaffolds and environment orchestration.

!Benchmark Results

For more details, please refer to our blog post Qwen3.5.

Model Overview

Type: Causal Language Model with Vision Encoder
Training Stage: Pre-training & Post-training
Language Model
Number of Parameters: 35B in total and 3B activated
Hidden Dimension: 2048
Token Embedding: 248320 (Padded)
Number of Layers: 40
Hidden Layout: 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))
Gated DeltaNet:
Number of Linear Attention Heads: 32 for V and 16 for QK
Head Dimension: 128
Gated Attention:
Number of Attention Heads: 16 for Q and 2 for KV
Head Dimension: 256
Rotary Position Embedding Dimension: 64
Mixture Of Experts
Number of Experts: 256
Number of Activated Experts: 8 Routed + 1 Shared
Expert Intermediate Dimension: 512
LM Output: 248320 (Padded)
MTP: trained with multi-steps
Context Length: 262,144 natively and extensible up to 1,010,000 tokens.

Benchmark Results

Language

GPT-5-mini 2025-08-07 GPT-OSS-120B Qwen3-235B-A22B Qwen3.5-122B-A10B Qwen3.5-27B Qwen3.5-35B-A3B

Knowledge

MMLU-Pro 83.7 80.8 84.4 86.7 86.1 85.3

MMLU-Redux 93.7 91.0 93.8 94.0 93.2 93.3

C-Eval 82.2 76.2 92.1 91.9 90.5 90.2

SuperGPQA 58.6 54.6 64.9 67.1 65.6 63.4

Instruction Following

IFEval 93.9 88.9 87.8 93.4 95.0 91.9

IFBench 75.4 69.0 51.7 76.1 76.5 70.2

MultiChallenge 59.0 45.3 50.2 61.5 60.8 60.0

Long Context

AA-LCR 68.0 50.7 60.0 66.9 66.1 58.5

LongBench v2 56.8 48.2 54.8 60.2 60.6 59.0

STEM & Reasoning

HLE w/ CoT 19.4 14.9 18.2 25.3 24.3 22.4

GPQA Diamond 82.8 80.1 81.1 86.6 85.5 84.2

HMMT Feb 25 89.2 90.0 85.1 91.4 92.0 89.0

HMMT Nov 25 84.2 90.0 89.5 90.3 89.8 89.2

Coding

SWE-bench Verified 72.0 62.0 -- 72.0 72.4 69.2

Terminal Bench 2 31.9 18.7 -- 49.4 41.6 40.5

LiveCodeBench v6 80.5 82.7 75.1 78.9 80.7 74.6

CodeForces 2160 2157 2146 2100 1899 2028

OJBench 40.4 41.5 32.7 39.5 40.1 36.0

FullStackBench en 30.6 58.9 <td style="padding:7px 7px;text-align:center;border-bottom:1px solid rgba(128, 128, 128, 0.1

Notability

notability 10.0/10

Major model release with massive traction