ModelQwen (Alibaba Cloud)Qwen (Alibaba Cloud)published Apr 15, 2026seen 5d

Qwen/Qwen3.6-35B-A3B

Open original ↗

Captured source

source ↗
published Apr 15, 2026seen 5dcaptured 11hhttp 200method plaintask image-text-to-textlicense apache-2.0library transformersparams 36Bdownloads 5130klikes 2.1k

Qwen3.6-35B-A3B

> [!Note] > This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. > > These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc.

Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience.

Qwen3.6 Highlights

This release delivers substantial upgrades, particularly in

  • Agentic Coding: the model now handles frontend workflows and repository-level reasoning with greater fluency and precision.
  • Thinking Preservation: we've introduced a new option to retain reasoning context from historical messages, streamlining iterative development and reducing overhead.

!Benchmark Results

For more details, please refer to our blog post Qwen3.6-35B-A3B.

Model Overview

  • Type: Causal Language Model with Vision Encoder
  • Training Stage: Pre-training & Post-training
  • Language Model
  • Number of Parameters: 35B in total and 3B activated
  • Hidden Dimension: 2048
  • Token Embedding: 248320 (Padded)
  • Number of Layers: 40
  • Hidden Layout: 10 × (3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE))
  • Gated DeltaNet:
  • Number of Linear Attention Heads: 32 for V and 16 for QK
  • Head Dimension: 128
  • Gated Attention:
  • Number of Attention Heads: 16 for Q and 2 for KV
  • Head Dimension: 256
  • Rotary Position Embedding Dimension: 64
  • Mixture Of Experts
  • Number of Experts: 256
  • Number of Activated Experts: 8 Routed + 1 Shared
  • Expert Intermediate Dimension: 512
  • LM Output: 248320 (Padded)
  • MTP: trained with multi-steps
  • Context Length: 262,144 natively and extensible up to 1,010,000 tokens.

Benchmark Results

Language

Qwen3.5-27BGemma4-31BQwen3.5-35BA3BGemma4-26BA4BQwen3.6-35BA3B

Coding Agent

SWE-bench Verified 75.0 52.0 70.0 17.4 73.4

SWE-bench Multilingual 69.3 51.7 60.3 17.3 67.2

SWE-bench Pro 51.2 35.7 44.6 13.8 49.5

Terminal-Bench 2.0 41.6 42.9 40.5 34.2 51.5

Claw-Eval Avg 64.3 48.5 65.4 58.8 68.7

Claw-Eval Pass^3 46.2 25.0 51.0 28.0 50.0

SkillsBench Avg5 27.2 23.6 4.4 12.3 28.7

QwenClawBench 52.2 41.7 47.7 38.7 52.6

NL2Repo 27.3 15.5 20.5 11.6 29.4

QwenWebBench 1068 1197 978 1178 1397

General Agent

TAU3-Bench 68.4 67.5 68.9 59.0 67.2

VITA-Bench 41.8 43.0 29.1 36.9 35.6

DeepPlanning 22.6 24.0 22.8 16.2 25.9

Tool Decathlon 31.5 21.2 28.7 12.0 26.9

MCPMark 36.3 18.1 27.0 14.2 37.0

MCP-Atlas 68.4 57.2 62.4 50.0 62.8

WideSearch 66.4 35.2 59.1 38.3 60.1

Knowledge

MMLU-Pro 86.1 85.2 85.3 82.6 85.2

MMLU-Redux 93.2 93.7 93.3 92.7 93.3

SuperGPQA 65.6 65.7 63.4 61.4 64.7

C-Eval 90.5 82.6 90.2 82.5 90.0

STEM & Reasoning

GPQA 85.5 84.3 84.2 82.3 86.0

HLE 24.3 19.5 22.4 8.7 21.4

LiveCodeBench v6 80.7 80.0 74.6 77.1 <td style="padding:7px 7px;text-align:center;border-bottom:1px so

Notability

notability 10.0/10

Huge traction, frontier model release.