What does this model signal mean?

Qwen (Alibaba Cloud) published Qwen/Qwen3.6-35B-A3B. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license apache-2.0 · 6.4M HF downloads · Huge traction, frontier model release.. onlylabs links this event to 1 captured evidence page and 6 related model signals.

Qwen (Alibaba Cloud) Model: Qwen/Qwen3.6-35B-A3B

Captured source

source ↗

Hugging Face/huggingface.co/Qwen/Qwen3.6-35B-A3B

Qwen/Qwen3.6-35B-A3B model card

Source ↗

published Apr 15, 2026seen Jun 6captured Jun 11http 200method plaintask image-text-to-textlicense apache-2.0library transformersparams 36Bdownloads 6413klikes 2.5k

Qwen3.6-35B-A3B

> [!Note] > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.

Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience.

Qwen3.6 Highlights

This release delivers substantial upgrades, particularly in

Agentic Coding: the model now handles frontend workflows and repository-level reasoning with greater fluency and precision.
Thinking Preservation: we've introduced a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead.

!Benchmark Results

For more details, please refer to our blog post Qwen3.6-35B-A3B.

Model Overview

Type: Causal Language Model with Vision Encoder
Training Stage: Pre-training & Post-training
Language Model
Number of Parameters: 35B in total and 3B activated
Hidden Dimension: 2048
Token Embedding: 248320 (Padded)
Number of Layers: 40
Hidden Layout: 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))
Gated DeltaNet:
Number of Linear Attention Heads: 32 for V and 16 for QK
Head Dimension: 128
Gated Attention:
Number of Attention Heads: 16 for Q and 2 for KV
Head Dimension: 256
Rotary Position Embedding Dimension: 64
Mixture Of Experts
Number of Experts: 256
Number of Activated Experts: 8 Routed + 1 Shared
Expert Intermediate Dimension: 512
LM Output: 248320 (Padded)
MTP: trained with multi-steps
Context Length: 262,144 natively and extensible up to 1,010,000 tokens.

Benchmark Results

Language

Qwen3.5-27BGemma4-31BQwen3.5-35BA3BGemma4-26BA4BQwen3.6-35BA3B

Coding Agent

SWE-bench Verified 75.0 52.0 70.0 17.4 73.4

SWE-bench Multilingual 69.3 51.7 60.3 17.3 67.2

SWE-bench Pro 51.2 35.7 44.6 13.8 49.5

Terminal-Bench 2.0 41.6 42.9 40.5 34.2 51.5

Claw-Eval Avg 64.3 48.5 65.4 58.8 68.7

Claw-Eval Pass^3 46.2 25.0 51.0 28.0 50.0

SkillsBench Avg5 27.2 23.6 4.4 12.3 28.7

QwenClawBench 52.2 41.7 47.7 38.7 52.6

NL2Repo 27.3 15.5 20.5 11.6 29.4

QwenWebBench 1068 1197 978 1178 1397

General Agent

TAU3-Bench 68.4 67.5 68.9 59.0 67.2

VITA-Bench 41.8 43.0 29.1 36.9 35.6

DeepPlanning 22.6 24.0 22.8 16.2 25.9

Tool Decathlon 31.5 21.2 28.7 12.0 26.9

MCPMark 36.3 18.1 27.0 14.2 37.0

MCP-Atlas 68.4 57.2 62.4 50.0 62.8

WideSearch 66.4 35.2 59.1 38.3 60.1

Knowledge

MMLU-Pro 86.1 85.2 85.3 82.6 85.2

MMLU-Redux 93.2 93.7 93.3 92.7 93.3

SuperGPQA 65.6 65.7 63.4 61.4 64.7

C-Eval 90.5 82.6 90.2 82.5 90.0

STEM & Reasoning

GPQA 85.5 84.3 84.2 82.3 86.0

HLE 24.3 19.5 22.4 8.7 21.4

LiveCodeBench v6 80.7 80.0 74.6 77.1 <td style="padding:7px 7px;text-align:center;border-bottom:1px so

Notability

notability 10.0/10

Huge traction, frontier model release.