RepoOpenBMB (MiniCPM)OpenBMB (MiniCPM)published Nov 11, 2024seen 5d

OpenBMB/UltraEval-Audio

Python

Open original ↗

Captured source

source ↗
published Nov 11, 2024seen 5dcaptured 9hhttp 200method plain

OpenBMB/UltraEval-Audio

Description: Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。

Language: Python

License: Apache-2.0

Stars: 303

Forks: 24

Open issues: 3

Created: 2024-11-11T09:41:43Z

Pushed: 2026-06-10T07:12:04Z

Default branch: main

Fork: no

Archived: no

README: ![assets/logo.png](assets/logo.png)

A Unified Framework for Comprehensive Evaluation of Audio Foundation Models

中文 | English | 💬Discord | UltraEval-Audio Paper

v1.1 Highlights

> - Popular model replication: Added replication support for popular models, including replication result showcases and one-click replication commands (see replication/). > - Isolated Runtime: Introduced an isolated inference mechanism. Model-specific dependencies are installed/managed automatically; inference runs in the isolated environment and communicates with the main evaluation process via IPC, eliminating dependency conflicts. > - Specialized model evaluation support: Added specialized audio models for TTS, ASR, and Audio Codec, further expanding evaluation coverage.

Overview

🚀Exceptional Experience with UltraEval-Audio🚀

UltraEval-Audio — The world's first open-source framework supporting both speech understanding and speech generation evaluation, specifically designed for large audio models. It aggregates 34 authoritative benchmarks, covering four major domains: speech, sound, medicine, and music, supporting 10 languages and 12 task categories. With UltraEval-Audio, you will experience unprecedented convenience and efficiency:

  • Direct Replication of Popular Models 🔬: Provides detailed [replication documentation and commands](./replication/), ensuring you can easily reproduce evaluation results of open-source models with complete transparency and reproducibility.
  • One-Click Benchmark Management 📥: Say goodbye to tedious manual downloading and data processing. UltraEval-Audio automates it all, letting you easily acquire well-known benchmark datasets (e.g., Librispeech, TED-LIUM, Seed-TTS-Eval).
  • Built-in Evaluation Tools ⚙️: No need to hunt for evaluation tools. UltraEval-Audio binds datasets with commonly used official evaluation methods (e.g., WER, WER-ZH, BLEU, G-Eval) to ensure alignment between datasets and metrics.
  • Powerful and Flexible 🛠️: Supports preview testing, random sampling, error retries, and resume-from-breakpoint, ensuring a flexible and controllable evaluation process while boosting efficiency and accuracy.
  • Seamless Integration of Custom Datasets 💼: Supports not only public benchmarks but also powerful custom dataset integration, allowing rapid application in various engineering scenarios.
  • Easy Integration with Existing Systems 🔗: With excellent extensibility and standardized design, UltraEval-Audio seamlessly connects with your existing evaluation pipelines, simplifying project management and unifying output results.

![UEA_Architecture](assets/ultraeval_audio_framework.png)

Changelog🔥

  • [2026/06/10]
  • Support [Qwen3-ASR](replication/qwen3_asr.md) evaluation (qwen3-asr-1.7b, qwen3-asr-0.6b), with replication results and commands for English, Chinese, and Chinese dialect ASR benchmarks.
  • [2026/04/20]
  • Support [Fish Speech S2 Pro](replication/fishaudio-s2-pro.md) evaluation, including Seed-TTS-Eval and MiniMax multilingual TTS benchmarks (22 languages)
  • [2026/02/03]
  • Support [Qwen3-TTS](replication/qwen3_tts.md) evaluation
  • GPU parallel acceleration for faster evaluation/inference
  • Usage: add --use_model_pool and --workers to enable multi-GPU parallel inference, e.g.
  • python audio_evals/main.py --dataset --model --use_model_pool --workers 4
  • [2026/01/19]
  • Support Step-Audio-R1.1 evaluation, with replication report: [Step-Audio-R1.1](replication/step-audio-r1_1.md)
  • [2025/12/31]
  • release v1.1 🎉🎉🎉
  • Add replication docs for popular models: [CosyVoice2](replication/CosyVoice2.md), [CosyVoice3](replication/CosyVoice3.md), [GLM-TTS](replication/GLM-TTS.md), [IndexTTS2](replication/IndexTTS2.md), [VoxCPM](replication/VoxCPM.md)
  • support Isolated Runtime offline inference
  • support TTS、ASR、Audio Codec specific task audio model
  • [2025/12/04]
  • Support [Qwen3-Omni](replication/qwen3_omni.md), update [Kimi-Audio](replication/kimi-audio.md)
  • [2025/12/02]
  • 🌟 Added [Replication Results and Command Documentation](./replication/): To better support the open-source community, we have detailed the evaluation process and results of current open-source models, ensuring the evaluation process is completely transparent and reproducible.
  • Support [Long-TTS-Eval](registry/dataset/long-tts-eval.yaml) dataset, see alignment details in [Long-TTS-Eval](./replication/Long-TTS-Eval.md)
  • Support [MGM-Omni TTS](registry/model/mgm_omni.yaml) model, see alignment details in [MGM-Omni](./replication/MGM-Omni.md)
  • [2025/10/30]
  • Support VoxCPM TTS model: --model voxcpm-tts --model voxcpm-vc
  • Use uv to accelerate model dependency installation 🚀
  • [2025/10/17]
  • [Support seed-tts-eval dataset](docs/seed-tts-eval4voice_clone.md)
  • [2025/05/22]
  • Use audio quality metrics
  • [2025/05/12]
  • Support Qwen2.5-Omni qwen2.5-omni-audio, qwen2.5-omni-speech, Kimi-Audio-7B-Instruct kimiaudio, kimiaudio-speech models, and update Audio Understanding Leaderboard
  • [2025/05/8]
  • Faster resume evaluation, -r/--resume parameter, automatically searches for the latest breakpoint result if no file is specified
  • Support evaluation starting from inference file, --infer-file parameter, allows direct evaluation from inference file without regeneration
  • [2025/03/23]
  • Added support for step-audio model evaluation and ranking
  • Ranking details: [leaderboard.md](assets/leaderboard.md)
  • Evaluation support: Step-Audio-Chat
  • [2025/03/04]
  • Support [resume evaluation](docs/Procedures for Restarting an Incomplete Evaluation.md), command line parameter --resume $checkpoint_res_file
  • glm-4-voice service deployment, supports UltraEval-Audio evaluation, see details at GLM-4-Voice
  • Parallel evaluation support, command line parameter --workers $num_workers
  • [2025/01/13] release v1.0

Leaderboard

Audio…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New audio eval repo, moderate traction.