RepoQwen (Alibaba Cloud)Qwen (Alibaba Cloud)published Jan 28, 2026seen 6d

QwenLM/Qwen3-ASR

Python

Open original ↗

Captured source

source ↗
published Jan 28, 2026seen 6dcaptured 9hhttp 200method plain

QwenLM/Qwen3-ASR

Description: Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Language: Python

License: Apache-2.0

Stars: 2874

Forks: 290

Open issues: 35

Created: 2026-01-28T05:44:59Z

Pushed: 2026-01-30T03:24:24Z

Default branch: main

Fork: no

Archived: no

README:

Qwen3-ASR

&nbsp&nbsp🤗 Hugging Face&nbsp&nbsp | &nbsp&nbsp🤖 ModelScope&nbsp&nbsp | &nbsp&nbsp📑 Blog&nbsp&nbsp | &nbsp&nbsp📑 Paper&nbsp&nbsp

🖥️ Hugging Face Demo&nbsp&nbsp | &nbsp&nbsp 🖥️ ModelScope Demo&nbsp&nbsp | &nbsp&nbsp💬 WeChat (微信)&nbsp&nbsp | &nbsp&nbsp🫨 Discord&nbsp&nbsp | &nbsp&nbsp📑 API

We release Qwen3-ASR, a family that includes two powerful all-in-one speech recognition models that support language identification and ASR for 52 languages and dialects, as well as a novel non-autoregressive speech forced-alignment model that can align text–speech pairs in 11 languages.

News

  • 2026.1.29: 🎉🎉🎉 We have released the Qwen3-ASR series (0.6B/1.7B) and the Qwen3-ForcedAligner-0.6B model. Please check out our blog!

Contents

  • [Overview](#overview)
  • [Introduction](#introduction)
  • [Model Architecture](#model-architecture)
  • [Released Models Description and Download](#released-models-description-and-download)
  • [Quickstart](#quickstart)
  • [Environment Setup](#environment-setup)
  • [Python Package Usage](#python-package-usage)
  • [Quick Inference](#quick-inference)
  • [vLLM Backend](#vllm-backend)
  • [Streaming Inference](#streaming-inference)
  • [ForcedAligner Usage](#forcedaligner-usage)
  • [DashScope API Usage](#dashscope-api-usage)
  • [Launch Local Web UI Demo](#launch-local-web-ui-demo)
  • [Gradio Demo](#gradio-demo)
  • [Streaming Demo](#streaming-demo)
  • [Deployment with vLLM](#deployment-with-vllm)
  • [Fine Tuning](#fine-tuning)
  • [Docker](#docker)
  • [Evaluation](#evaluation)
  • [Citation](#citation)

Overview

Introduction

The Qwen3-ASR family includes Qwen3-ASR-1.7B and Qwen3-ASR-0.6B, which support language identification and ASR for 52 languages and dialects. Both leverage large-scale speech training data and the strong audio understanding capability of their foundation model, Qwen3-Omni. Experiments show that the 1.7B version achieves state-of-the-art performance among open-source ASR models and is competitive with the strongest proprietary commercial APIs. Here are the main features:

  • All-in-one: Qwen3-ASR-1.7B and Qwen3-ASR-0.6B support language identification and speech recognition for 30 languages and 22 Chinese dialects, so as to English accents from multiple countries and regions.
  • Excellent and Fast: The Qwen3-ASR family ASR models maintains high-quality and robust recognition under complex acoustic environments and challenging text patterns. Qwen3-ASR-1.7B achieves strong performance on both open-sourced and internal benchmarks. While the 0.6B version achieves accuracy-efficient trade-off, it reaches 2000 times throughput at a concurrency of 128. They both achieve streaming / offline unified inference with single model and support transcribe long audio.
  • Novel and strong forced alignment Solution: We introduce Qwen3-ForcedAligner-0.6B, which supports timestamp prediction for arbitrary units within up to 5 minutes of speech in 11 languages. Evaluations show its timestamp accuracy surpasses E2E based forced-alignment models.
  • Comprehensive inference toolkit: In addition to open-sourcing the architectures and weights of the Qwen3-ASR series, we also release a powerful, full-featured inference framework that supports vLLM-based batch inference, asynchronous serving, streaming inference, timestamp prediction, and more.

Model Architecture

Released Models Description and Download

Below is an introduction and download information for the Qwen3-ASR models. Please select and download the model that fits your needs.

| Model | Supported Languages | Supported Dialects | Inference Mode | Audio Types | |---|---|---|---|---| | Qwen3-ASR-1.7B & Qwen3-ASR-0.6B | Chinese (zh), English (en), Cantonese (yue), Arabic (ar), German (de), French (fr), Spanish (es), Portuguese (pt), Indonesian (id), Italian (it), Korean (ko), Russian (ru), Thai (th), Vietnamese (vi), Japanese (ja), Turkish (tr), Hindi (hi), Malay (ms), Dutch (nl), Swedish (sv), Danish (da), Finnish (fi), Polish (pl), Czech (cs), Filipino (fil), Persian (fa), Greek (el), Hungarian (hu), Macedonian (mk), Romanian (ro) | Anhui, Dongbei, Fujian, Gansu, Guizhou, Hebei, Henan, Hubei, Hunan, Jiangxi, Ningxia, Shandong, Shaanxi, Shanxi, Sichuan, Tianjin, Yunnan, Zhejiang, Cantonese (Hong Kong accent), Cantonese (Guangdong accent), Wu language, Minnan language. | Offline / Streaming | Speech, Singing Voice, Songs with BGM | | Qwen3-ForcedAligner-0.6B | Chinese, English, Cantonese, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish | -- | NAR | Speech |

During model loading in the qwen-asr package or vLLM, model weights will be downloaded automatically based on the model name. However, if your runtime environment does not allow downloading weights during execution, you can use the following commands to manually download the model weights to a local directory:

# Download through ModelScope (recommended for users in Mainland China)
pip install -U modelscope
modelscope download --model Qwen/Qwen3-ASR-1.7B --local_dir ./Qwen3-ASR-1.7B
modelscope download --model Qwen/Qwen3-ASR-0.6B --local_dir ./Qwen3-ASR-0.6B
modelscope download --model Qwen/Qwen3-ForcedAligner-0.6B --local_dir ./Qwen3-ForcedAligner-0.6B
# Download through Hugging Face
pip install -U "huggingface_hub[cli]"
huggingface-cli download Qwen/Qwen3-ASR-1.7B --local-dir ./Qwen3-ASR-1.7B
huggingface-cli download Qwen/Qwen3-ASR-0.6B --local-dir ./Qwen3-ASR-0.6B
huggingface-cli download Qwen/Qwen3-ForcedAligner-0.6B --local-dir ./Qwen3-ForcedAligner-0.6B

Quickstart

Environment Setup

The easiest way to use Qwen3-ASR is to install the qwen-asr Python package from PyPI. This will pull in the required runtime dependencies and allow you to load any released Qwen3-ASR model. If you’d like to simplify environment setup further, you can also use our official [Docker image](#docker). The qwen-asr package provides two backends: the transformers backend and…

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

Notable ASR model release from Qwen with solid traction.