What does this repo signal mean?

Qwen (Alibaba Cloud) published QwenLM/Qwen2.5-Omni (Jupyter Notebook). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo QwenLM/Qwen2.5-Omni · language Jupyter Notebook · New Qwen Omni model with high stars.. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Qwen (Alibaba Cloud) Repo: QwenLM/Qwen2.5-Omni

Captured source

source ↗

GitHub/github.com/QwenLM/Qwen2.5-Omni

QwenLM/Qwen2.5-Omni repository metadata

Source ↗

published Mar 22, 2025seen 6dcaptured 8hhttp 200method plain

QwenLM/Qwen2.5-Omni

Description: Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Language: Jupyter Notebook

License: Apache-2.0

Stars: 4021

Forks: 324

Open issues: 220

Created: 2025-03-22T01:43:13Z

Pushed: 2025-06-12T11:03:07Z

Default branch: main

Fork: no

Archived: no

README:

Qwen2.5-Omni

中文 &nbsp｜ &nbsp English&nbsp&nbsp

🖥️ Demo&nbsp&nbsp | &nbsp&nbsp💬 WeChat (微信)&nbsp&nbsp | &nbsp&nbsp🫨 Discord&nbsp&nbsp | &nbsp&nbsp📑 API

We release Qwen2.5-Omni, the new flagship end-to-end multimodal model in the Qwen series. Designed for comprehensive multimodal perception, it seamlessly processes diverse inputs including text, images, audio, and video, while delivering real-time streaming responses through both text generation and natural speech synthesis. Let's click the video below for more information 😃

News

2025.06.12: Qwen2.5-Omni-7B ranked first among open source models in the spoken language understanding and reasoning benchmark MMSU.
2025.06.09: Congratulations to our open source Qwen2.5-Omni-7B for ranking first in the MMAU leaderboard, and first in the MMAR of open source models in the audio understanding and reasoning evaluation!
2025.05.16: We release 4-bit quantized Qwen2.5-Omni-7B (GPTQ-Int4/AWQ) models that maintain comparable performance to the original version on multimodal evaluations while reducing GPU VRAM consumption by over 50%+. See [GPTQ-Int4 and AWQ Usage](#gptq-int4-and-awq-usage) for details, and models can be obtained from Hugging Face (GPTQ-Int4|AWQ) and ModelScope (GPTQ-Int4|AWQ)
2025.05.13: MNN Chat App support Qwen2.5-Omni now, let's experience Qwen2.5-Omni on the edge devices! Please refer to [Deployment with MNN](#deployment-with-mnn) for information about memory consumption and inference speed benchmarks.
2025.04.30: Exciting! We We have released Qwen2.5-Omni-3B to enable more platforms to run Qwen2.5-Omni. The model can be downloaded from Hugging Face. The [performance](#performance) of this model is updated, and please refer to [Minimum GPU memory requirements](#minimum-gpu-memory-requirements) for information about resource consumption. And for best experience, [transformers](#--transformers-usage) and [vllm](#deployment-with-vllm) code have update, you can pull the [official docker](#-docker) again to get them.
2025.04.11: We release the new vllm version which support audio ouput now! Please experience it from source or our docker image.
2025.04.02: ⭐️⭐️⭐️ Qwen2.5-Omni reaches top-1 on Hugging Face Trending!
2025.03.29: ⭐️⭐️⭐️ Qwen2.5-Omni reaches top-2 on Hugging Face Trending!
2025.03.26: Real-time interaction with Qwen2.5-Omni is available on Qwen Chat. Let's start this amazing journey now!
2025.03.26: We have released the Qwen2.5-Omni. For more details, please check our blog!

[Overview](#overview)
[Introduction](#introduction)
[Key Features](#key-features)
[Model Architecture](#model-architecture)
[Performance](#performance)
[Quickstart](#quickstart)
[Transformers Usage](#--transformers-usage)
[ModelScope Usage](#-modelscope-usage)
[GPTQ-Int4 and AWQ Usage](#gptq-int4-and-awq-usage)
[Usage Tips](#usage-tips)
[Cookbooks for More Usage Cases](#cookbooks-for-more-usage-cases)
[API inference](#api-inference)
[Customization Settings](#customization-settings)
[Chat with Qwen2.5-Omni](#chat-with-qwen25-omni)
[Online Demo](#online-demo)
[Launch Local Web UI Demo](#launch-local-web-ui-demo)
[Real-Time Interaction](#real-time-interaction)
[Deployment with vLLM](#deployment-with-vllm)
[Deployment with MNN](#deployment-with-mnn)
[Docker](#-docker)

Overview

Introduction

Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.

Key Features

Omni and Novel Architecture: We propose Thinker-Talker architecture, an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner. We propose a novel position embedding, named TMRoPE (Time-aligned Multimodal RoPE), to synchronize the timestamps of video inputs with audio.

Real-Time Voice and Video Chat: Architecture designed for fully real-time interactions, supporting chunked input and immediate output.

Natural and Robust Speech Generation: Surpassing many existing streaming and non-streaming alternatives, demonstrating superior robustness and naturalness in speech generation.

Strong Performance Across Modalities: Exhibiting exceptional performance across all modalities when benchmarked against similarly sized single-modality models. Qwen2.5-Omni outperforms the similarly sized Qwen2-Audio in audio capabilities and achieves comparable performance to Qwen2.5-VL-7B.

Excellent End-to-End Speech Instruction Following: Qwen2.5-Omni shows performance in end-to-end speech instruction following that rivals its effectiveness with text inputs, evidenced by benchmarks such as MMLU and GSM8K.

Model Architecture

Performance

We conducted a comprehensive evaluation of Qwen2.5-Omni, which demonstrates strong performance across all modalities when compared to similarly sized single-modality models and closed-source models like Qwen2.5-VL-7B, Qwen2-Audio, and Gemini-1.5-pro. In tasks requiring the integration of multiple…

Excerpt shown — open the source for the full document.

Notability

notability 8.0/10

New Qwen Omni model with high stars.