What does this repo signal mean?

Baidu (ERNIE) published PaddlePaddle/PaddleSpeech (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo PaddlePaddle/PaddleSpeech · language Python. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Baidu (ERNIE) Repo: PaddlePaddle/PaddleSpeech

Captured source

source ↗

GitHub/github.com/PaddlePaddle/PaddleSpeech

PaddlePaddle/PaddleSpeech repository metadata

Source ↗

published Nov 14, 2017seen 5dcaptured 9hhttp 200method plain

PaddlePaddle/PaddleSpeech

Description: Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Language: Python

License: Apache-2.0

Stars: 12614

Forks: 1957

Open issues: 271

Created: 2017-11-14T12:36:30Z

Pushed: 2026-06-10T06:42:49Z

Default branch: develop

Fork: no

Archived: no

README: ([简体中文](./README_cn.md)|English)

------------------------------------------------------------------------------------

PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.

PaddleSpeech won the NAACL2022 Best Demo Award, please check out our paper on Arxiv.

##### Speech Recognition

Input Audio Recognition Result

I knocked at the door on the ancient side of the building.

我认为跑步最重要的就是给我带来了身体健康。

##### Speech Translation (English to Chinese)

Input Audio Translations Result

我在这栋建筑的古老门上敲门。

##### Text-to-Speech

Input Text Synthetic Audio

Life was like a box of chocolates, you never know what you're gonna get.

早上好，今天是2020/10/29，最低温度是-3°C。

季姬寂，集鸡，鸡即棘鸡。棘鸡饥叽，季姬及箕稷济鸡。鸡既济，跻姬笈，季姬忌，急咭鸡，鸡急，继圾几，季姬急，即籍箕击鸡，箕疾击几伎，伎即齑，鸡叽集几基，季姬急极屐击鸡，鸡既殛，季姬激，即记《季姬击鸡记》。

大家好，我是 parrot 虚拟老师，我们来读一首诗，我与春风皆过客，I and the spring breeze are passing by，你携秋水揽星河，you take the autumn water to take the galaxy。

宜家唔系事必要你讲，但系你所讲嘅说话将会变成呈堂证供。

各个国家有各个国家嘅国歌

For more synthesized audios, please refer to PaddleSpeech Text-to-Speech samples.

##### Punctuation Restoration

Input Text Output Text

今天的天气真不错啊你下午有空吗我想约你一起去吃饭今天的天气真不错啊！你下午有空吗？我想约你一起去吃饭。

Features

Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:

📦 Ease of Use: low barriers to install, [CLI](#quick-start), [Server](#quick-start-server), and [Streaming Server](#quick-start-streaming-server) is available to quick-start your journey.
🏆 Align to the State-of-the-Art: we provide high-speed and ultra-lightweight models, and also cutting-edge technology.
🏆 Streaming ASR and TTS System: we provide production ready streaming asr and streaming tts system.
💯 Rule-based Chinese frontend: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
📦 Varieties of Functions that Vitalize both Industrial and Academia:
🛎️ *Implementation of critical audio tasks*: this toolkit contains audio functions like Automatic Speech Recognition, Text-to-Speech Synthesis, Speaker Verification, KeyWord Spotting, Audio Classification, and Speech Translation, etc.
🔬 *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details.
🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

Recent Update

🎉 2025.09.01: Add Whisper large v3 and turbo model.
🤗 2025.08.11: Add [code-switch online model and server demo](./examples/tal_cs/asr1/).
👑 2023.05.31: Add WavLM ASR-en, WavLM fine-tuning for ASR on LibriSpeech.
🎉 2023.05.18: Add Squeezeformer, Squeezeformer training for ASR on Aishell.
👑 2023.05.04: Add HuBERT ASR-en, HuBERT fine-tuning for ASR on LibriSpeech.
⚡ 2023.04.28: Fix 0-d tensor, with the upgrade of paddlepaddle==2.5, the problem of modifying 0-d tensor has been solved.
👑 2023.04.25: Add AMP for U2 conformer.
🔥 2023.04.06: Add [subtitle file (.srt format) generation example](./demos/streaming_asr_server).
🔥 2023.03.14: Add SVS(Singing Voice Synthesis) examples with Opencpop dataset, including [DiffSinger](./examples/opencpop/svs1)、[PWGAN](./examples/opencpop/voc1) and [HiFiGAN](./examples/opencpop/voc5), the effect is continuously optimized.
👑 2023.03.09: Add [Wav2vec2ASR-zh](./examples/aishell/asr3).
🎉 2023.03.07: Add [TTS ARM Linux C++ Demo (with C++ Chinese Text Frontend)](./demos/TTSArmLinux).
🔥 2023.03.03 Add Voice Conversion [StarGANv2-VC synthesize pipeline](./examples/vctk/vc3).
🎉 2023.02.16: Add [Cantonese TTS](./examples/canton/tts3).
🔥 2023.01.10: Add [code-switch asr CLI and Demos](./demos/speech_recognition).
👑 2023.01.06: Add [code-switch asr tal_cs recipe](./examples/tal_cs/asr1/).
🎉 2022.12.02: Add [end-to-end Prosody Prediction pipeline](./examples/csmsc/tts3_rhy) (including using prosody labels in Acoustic Model).
🎉 2022.11.30: Add [TTS Android Demo](./demos/TTSAndroid).
🤗 2022.11.28: PP-TTS and PP-ASR demos are available in AIStudio and [official website

of paddlepaddle](https://www.paddlepaddle.org.cn/models).

👑 2022.11.18: Add Whisper CLI and Demos, support multi language recognition and translation.
🔥 2022.11.18: Add [Wav2vec2 CLI and Demos](./demos/speech_ssl), Support ASR and Feature Extraction.
🎉 2022.11.17: Add male voice for TTS.
🔥 2022.11.07: Add U2/U2++ C++ High Performance Streaming ASR Deployment.
👑 2022.11.01: Add…

Excerpt shown — open the source for the full document.