What does this repo signal mean?

OpenBMB (MiniCPM) published OpenBMB/UltraEval-Audio (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo OpenBMB/UltraEval-Audio · language Python · New audio eval repo, moderate traction.. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

OpenBMB (MiniCPM) Repo: OpenBMB/UltraEval-Audio

Captured source

source ↗

GitHub/github.com/OpenBMB/UltraEval-Audio

OpenBMB/UltraEval-Audio repository metadata

Source ↗

published Nov 11, 2024seen 5dcaptured 9hhttp 200method plain

OpenBMB/UltraEval-Audio

Description: Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测，知己知彼。

Language: Python

License: Apache-2.0

Stars: 303

Forks: 24

Open issues: 3

Created: 2024-11-11T09:41:43Z

Pushed: 2026-06-10T07:12:04Z

Default branch: main

Fork: no

Archived: no

README: ![assets/logo.png](assets/logo.png)

A Unified Framework for Comprehensive Evaluation of Audio Foundation Models

中文 | English | 💬Discord | UltraEval-Audio Paper

v1.1 Highlights

> - Popular model replication: Added replication support for popular models, including replication result showcases and one-click replication commands (see replication/). > - Isolated Runtime: Introduced an isolated inference mechanism. Model-specific dependencies are installed/managed automatically; inference runs in the isolated environment and communicates with the main evaluation process via IPC, eliminating dependency conflicts. > - Specialized model evaluation support: Added specialized audio models for TTS, ASR, and Audio Codec, further expanding evaluation coverage.

Overview

🚀Exceptional Experience with UltraEval-Audio🚀

UltraEval-Audio — The world's first open-source framework supporting both speech understanding and speech generation evaluation, specifically designed for large audio models. It aggregates 34 authoritative benchmarks, covering four major domains: speech, sound, medicine, and music, supporting 10 languages and 12 task categories. With UltraEval-Audio, you will experience unprecedented convenience and efficiency:

Direct Replication of Popular Models 🔬: Provides detailed [replication documentation and commands](./replication/), ensuring you can easily reproduce evaluation results of open-source models with complete transparency and reproducibility.
One-Click Benchmark Management 📥: Say goodbye to tedious manual downloading and data processing. UltraEval-Audio automates it all, letting you easily acquire well-known benchmark datasets (e.g., Librispeech, TED-LIUM, Seed-TTS-Eval).
Built-in Evaluation Tools ⚙️: No need to hunt for evaluation tools. UltraEval-Audio binds datasets with commonly used official evaluation methods (e.g., WER, WER-ZH, BLEU, G-Eval) to ensure alignment between datasets and metrics.
Powerful and Flexible 🛠️: Supports preview testing, random sampling, error retries, and resume-from-breakpoint, ensuring a flexible and controllable evaluation process while boosting efficiency and accuracy.
Seamless Integration of Custom Datasets 💼: Supports not only public benchmarks but also powerful custom dataset integration, allowing rapid application in various engineering scenarios.
Easy Integration with Existing Systems 🔗: With excellent extensibility and standardized design, UltraEval-Audio seamlessly connects with your existing evaluation pipelines, simplifying project management and unifying output results.

![UEA_Architecture](assets/ultraeval_audio_framework.png)

Changelog🔥

[2026/06/10]
Support [Qwen3-ASR](replication/qwen3_asr.md) evaluation (qwen3-asr-1.7b, qwen3-asr-0.6b), with replication results and commands for English, Chinese, and Chinese dialect ASR benchmarks.
[2026/04/20]
Support [Fish Speech S2 Pro](replication/fishaudio-s2-pro.md) evaluation, including Seed-TTS-Eval and MiniMax multilingual TTS benchmarks (22 languages)
[2026/02/03]
Support [Qwen3-TTS](replication/qwen3_tts.md) evaluation
GPU parallel acceleration for faster evaluation/inference
Usage: add --use_model_pool and --workers to enable multi-GPU parallel inference, e.g.
python audio_evals/main.py --dataset --model --use_model_pool --workers 4
[2026/01/19]
Support Step-Audio-R1.1 evaluation, with replication report: [Step-Audio-R1.1](replication/step-audio-r1_1.md)
[2025/12/31]
release v1.1 🎉🎉🎉
Add replication docs for popular models: [CosyVoice2](replication/CosyVoice2.md), [CosyVoice3](replication/CosyVoice3.md), [GLM-TTS](replication/GLM-TTS.md), [IndexTTS2](replication/IndexTTS2.md), [VoxCPM](replication/VoxCPM.md)
support Isolated Runtime offline inference
support TTS、ASR、Audio Codec specific task audio model
[2025/12/04]
Support [Qwen3-Omni](replication/qwen3_omni.md), update [Kimi-Audio](replication/kimi-audio.md)
[2025/12/02]
🌟 Added [Replication Results and Command Documentation](./replication/): To better support the open-source community, we have detailed the evaluation process and results of current open-source models, ensuring the evaluation process is completely transparent and reproducible.
Support [Long-TTS-Eval](registry/dataset/long-tts-eval.yaml) dataset, see alignment details in [Long-TTS-Eval](./replication/Long-TTS-Eval.md)
Support [MGM-Omni TTS](registry/model/mgm_omni.yaml) model, see alignment details in [MGM-Omni](./replication/MGM-Omni.md)
[2025/10/30]
Support VoxCPM TTS model: --model voxcpm-tts --model voxcpm-vc
Use uv to accelerate model dependency installation 🚀
[2025/10/17]
[Support seed-tts-eval dataset](docs/seed-tts-eval4voice_clone.md)
[2025/05/22]
Use audio quality metrics
[2025/05/12]
Support Qwen2.5-Omni qwen2.5-omni-audio, qwen2.5-omni-speech, Kimi-Audio-7B-Instruct kimiaudio, kimiaudio-speech models, and update Audio Understanding Leaderboard
[2025/05/8]
Faster resume evaluation, -r/--resume parameter, automatically searches for the latest breakpoint result if no file is specified
Support evaluation starting from inference file, --infer-file parameter, allows direct evaluation from inference file without regeneration
[2025/03/23]
Added support for step-audio model evaluation and ranking
Ranking details: [leaderboard.md](assets/leaderboard.md)
Evaluation support: Step-Audio-Chat
[2025/03/04]
Support [resume evaluation](docs/Procedures for Restarting an Incomplete Evaluation.md), command line parameter --resume $checkpoint_res_file
glm-4-voice service deployment, supports UltraEval-Audio evaluation, see details at GLM-4-Voice
Parallel evaluation support, command line parameter --workers $num_workers
[2025/01/13] release v1.0

Leaderboard

Audio…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New audio eval repo, moderate traction.