PaddlePaddle/PaddleSpeech
Python
Captured source
source ↗PaddlePaddle/PaddleSpeech
Description: Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Language: Python
License: Apache-2.0
Stars: 12614
Forks: 1957
Open issues: 271
Created: 2017-11-14T12:36:30Z
Pushed: 2026-06-10T06:42:49Z
Default branch: develop
Fork: no
Archived: no
README: ([简体中文](./README_cn.md)|English)
------------------------------------------------------------------------------------
PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.
PaddleSpeech won the NAACL2022 Best Demo Award, please check out our paper on Arxiv.
##### Speech Recognition
Input Audio Recognition Result
I knocked at the door on the ancient side of the building.
我认为跑步最重要的就是给我带来了身体健康。
##### Speech Translation (English to Chinese)
Input Audio Translations Result
我 在 这栋 建筑 的 古老 门上 敲门。
##### Text-to-Speech
Input Text Synthetic Audio
Life was like a box of chocolates, you never know what you're gonna get.
早上好,今天是2020/10/29,最低温度是-3°C。
季姬寂,集鸡,鸡即棘鸡。棘鸡饥叽,季姬及箕稷济鸡。鸡既济,跻姬笈,季姬忌,急咭鸡,鸡急,继圾几,季姬急,即籍箕击鸡,箕疾击几伎,伎即齑,鸡叽集几基,季姬急极屐击鸡,鸡既殛,季姬激,即记《季姬击鸡记》。
大家好,我是 parrot 虚拟老师,我们来读一首诗,我与春风皆过客,I and the spring breeze are passing by,你携秋水揽星河,you take the autumn water to take the galaxy。
宜家唔系事必要你讲,但系你所讲嘅说话将会变成呈堂证供。
各个国家有各个国家嘅国歌
For more synthesized audios, please refer to PaddleSpeech Text-to-Speech samples.
##### Punctuation Restoration
Input Text Output Text
今天的天气真不错啊你下午有空吗我想约你一起去吃饭 今天的天气真不错啊!你下午有空吗?我想约你一起去吃饭。
Features
Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:
- 📦 Ease of Use: low barriers to install, [CLI](#quick-start), [Server](#quick-start-server), and [Streaming Server](#quick-start-streaming-server) is available to quick-start your journey.
- 🏆 Align to the State-of-the-Art: we provide high-speed and ultra-lightweight models, and also cutting-edge technology.
- 🏆 Streaming ASR and TTS System: we provide production ready streaming asr and streaming tts system.
- 💯 Rule-based Chinese frontend: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
- 📦 Varieties of Functions that Vitalize both Industrial and Academia:
- 🛎️ *Implementation of critical audio tasks*: this toolkit contains audio functions like Automatic Speech Recognition, Text-to-Speech Synthesis, Speaker Verification, KeyWord Spotting, Audio Classification, and Speech Translation, etc.
- 🔬 *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details.
- 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).
Recent Update
- 🎉 2025.09.01: Add Whisper large v3 and turbo model.
- 🤗 2025.08.11: Add [code-switch online model and server demo](./examples/tal_cs/asr1/).
- 👑 2023.05.31: Add WavLM ASR-en, WavLM fine-tuning for ASR on LibriSpeech.
- 🎉 2023.05.18: Add Squeezeformer, Squeezeformer training for ASR on Aishell.
- 👑 2023.05.04: Add HuBERT ASR-en, HuBERT fine-tuning for ASR on LibriSpeech.
- ⚡ 2023.04.28: Fix 0-d tensor, with the upgrade of paddlepaddle==2.5, the problem of modifying 0-d tensor has been solved.
- 👑 2023.04.25: Add AMP for U2 conformer.
- 🔥 2023.04.06: Add [subtitle file (.srt format) generation example](./demos/streaming_asr_server).
- 🔥 2023.03.14: Add SVS(Singing Voice Synthesis) examples with Opencpop dataset, including [DiffSinger](./examples/opencpop/svs1)、[PWGAN](./examples/opencpop/voc1) and [HiFiGAN](./examples/opencpop/voc5), the effect is continuously optimized.
- 👑 2023.03.09: Add [Wav2vec2ASR-zh](./examples/aishell/asr3).
- 🎉 2023.03.07: Add [TTS ARM Linux C++ Demo (with C++ Chinese Text Frontend)](./demos/TTSArmLinux).
- 🔥 2023.03.03 Add Voice Conversion [StarGANv2-VC synthesize pipeline](./examples/vctk/vc3).
- 🎉 2023.02.16: Add [Cantonese TTS](./examples/canton/tts3).
- 🔥 2023.01.10: Add [code-switch asr CLI and Demos](./demos/speech_recognition).
- 👑 2023.01.06: Add [code-switch asr tal_cs recipe](./examples/tal_cs/asr1/).
- 🎉 2022.12.02: Add [end-to-end Prosody Prediction pipeline](./examples/csmsc/tts3_rhy) (including using prosody labels in Acoustic Model).
- 🎉 2022.11.30: Add [TTS Android Demo](./demos/TTSAndroid).
- 🤗 2022.11.28: PP-TTS and PP-ASR demos are available in AIStudio and [official website
of paddlepaddle](https://www.paddlepaddle.org.cn/models).
- 👑 2022.11.18: Add Whisper CLI and Demos, support multi language recognition and translation.
- 🔥 2022.11.18: Add [Wav2vec2 CLI and Demos](./demos/speech_ssl), Support ASR and Feature Extraction.
- 🎉 2022.11.17: Add male voice for TTS.
- 🔥 2022.11.07: Add U2/U2++ C++ High Performance Streaming ASR Deployment.
- 👑 2022.11.01: Add…
Excerpt shown — open the source for the full document.