ModelInclusionAI (Ant Group)InclusionAI (Ant Group)published Feb 12, 2026seen 5d

inclusionAI/ZwZ-4B

Open original ↗

Captured source

source ↗
published Feb 12, 2026seen 5dcaptured 9hhttp 200method plaintask image-text-to-textlicense apache-2.0library transformersparams 4.8Bdownloads 358likes 33

ZwZ-4B

Model Summary

ZwZ-4B is a fine-grained multimodal perception model built upon Qwen3-VL-4B. It is trained using Region-to-Image Distillation (R2I) combined with reinforcement learning, enabling superior fine-grained visual understanding in a single forward pass — no inference-time zooming or tool calling required. ZwZ-4B achieves state-of-the-art performance on fine-grained perception benchmarks among open-source models of comparable size.

Models General Perception Specific Perception OOD Generalization Avg

ZoomBench HR-4K HR-8K VStar CV-B. MME-RW-en MME-RW-cn GP-Avg CountQA ColorB. MMStar BabyVision

Closed-Source Models

GPT-5.1 47.22 67.00 65.25 70.16 84.22 64.04 55.57 64.78 31.41 83.43 71.60 13.92 59.44

Gemini-3-Flash 59.29 87.88 85.00 86.39 89.57 74.86 72.62 79.37 66.88 85.47 83.60 34.51 75.10

Open-Source Models

Qwen3-VL-2B 41.30 71.75 70.12 72.77 78.94 59.52 60.77 65.02 22.19 76.86 60.4 12.11 56.98

Qwen3-VL-4B 40.24 78.25 72.88 80.10 84.95 63.47 63.63 69.07 28.14 81.63 69.73 13.66 61.52

Qwen2.5-VL-7B 42.49 71.62 67.88 78.53 75.34 60.80 58.30 64.99 18.91 76.36 61.93 12.89 56.82

Qwen3-VL-8B 37.87 78.88 74.63 86.39 85.44 65.96 66.67 70.83 28.99 82.77 70.93 12.89 62.86

MiMo-VL-7B-RL 45.09 74.38 72.88 81.15 84.31 63.40 59.78 68.71 28.27 82.80 73.53 16.24 61.98

MiniCPM-V-4.5 (9B) 42.60 69.88 63.62 70.16 80.25 58.16 56.23 62.99 23.43 79.75 67.87 14.95 56.99

GLM-4.5V (108B) 49.23 81.63 74.88 83.25 87.59 66.04 60.71 71.90 35.93 84.59 75.87 15.72 65.04

Qwen3-VL-235B-A22B <td style="pad

Notability

notability 3.0/10

Low downloads, minor model.