RepoBaidu (ERNIE)Baidu (ERNIE)published Nov 14, 2017seen 5d

PaddlePaddle/PaddleSpeech

Python

Open original ↗

Captured source

source ↗
published Nov 14, 2017seen 5dcaptured 9hhttp 200method plain

PaddlePaddle/PaddleSpeech

Description: Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Language: Python

License: Apache-2.0

Stars: 12614

Forks: 1957

Open issues: 271

Created: 2017-11-14T12:36:30Z

Pushed: 2026-06-10T06:42:49Z

Default branch: develop

Fork: no

Archived: no

README: ([简体中文](./README_cn.md)|English)

------------------------------------------------------------------------------------

PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with the state-of-art and influential models.

PaddleSpeech won the NAACL2022 Best Demo Award, please check out our paper on Arxiv.

##### Speech Recognition

Input Audio Recognition Result

I knocked at the door on the ancient side of the building.

我认为跑步最重要的就是给我带来了身体健康。

##### Speech Translation (English to Chinese)

Input Audio Translations Result

我 在 这栋 建筑 的 古老 门上 敲门。

##### Text-to-Speech

Input Text Synthetic Audio

Life was like a box of chocolates, you never know what you're gonna get.

早上好,今天是2020/10/29,最低温度是-3°C。

季姬寂,集鸡,鸡即棘鸡。棘鸡饥叽,季姬及箕稷济鸡。鸡既济,跻姬笈,季姬忌,急咭鸡,鸡急,继圾几,季姬急,即籍箕击鸡,箕疾击几伎,伎即齑,鸡叽集几基,季姬急极屐击鸡,鸡既殛,季姬激,即记《季姬击鸡记》。

大家好,我是 parrot 虚拟老师,我们来读一首诗,我与春风皆过客,I and the spring breeze are passing by,你携秋水揽星河,you take the autumn water to take the galaxy。

宜家唔系事必要你讲,但系你所讲嘅说话将会变成呈堂证供。

各个国家有各个国家嘅国歌

For more synthesized audios, please refer to PaddleSpeech Text-to-Speech samples.

##### Punctuation Restoration

Input Text Output Text

今天的天气真不错啊你下午有空吗我想约你一起去吃饭 今天的天气真不错啊!你下午有空吗?我想约你一起去吃饭。

Features

Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. To be more specific, this toolkit features at:

  • 📦 Ease of Use: low barriers to install, [CLI](#quick-start), [Server](#quick-start-server), and [Streaming Server](#quick-start-streaming-server) is available to quick-start your journey.
  • 🏆 Align to the State-of-the-Art: we provide high-speed and ultra-lightweight models, and also cutting-edge technology.
  • 🏆 Streaming ASR and TTS System: we provide production ready streaming asr and streaming tts system.
  • 💯 Rule-based Chinese frontend: our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.
  • 📦 Varieties of Functions that Vitalize both Industrial and Academia:
  • 🛎️ *Implementation of critical audio tasks*: this toolkit contains audio functions like Automatic Speech Recognition, Text-to-Speech Synthesis, Speaker Verification, KeyWord Spotting, Audio Classification, and Speech Translation, etc.
  • 🔬 *Integration of mainstream models and datasets*: the toolkit implements modules that participate in the whole pipeline of the speech tasks, and uses mainstream datasets like LibriSpeech, LJSpeech, AIShell, CSMSC, etc. See also [model list](#model-list) for more details.
  • 🧩 *Cascaded models application*: as an extension of the typical traditional audio tasks, we combine the workflows of the aforementioned tasks with other fields like Natural language processing (NLP) and Computer Vision (CV).

Recent Update

  • 🎉 2025.09.01: Add Whisper large v3 and turbo model.
  • 🤗 2025.08.11: Add [code-switch online model and server demo](./examples/tal_cs/asr1/).
  • 👑 2023.05.31: Add WavLM ASR-en, WavLM fine-tuning for ASR on LibriSpeech.
  • 🎉 2023.05.18: Add Squeezeformer, Squeezeformer training for ASR on Aishell.
  • 👑 2023.05.04: Add HuBERT ASR-en, HuBERT fine-tuning for ASR on LibriSpeech.
  • ⚡ 2023.04.28: Fix 0-d tensor, with the upgrade of paddlepaddle==2.5, the problem of modifying 0-d tensor has been solved.
  • 👑 2023.04.25: Add AMP for U2 conformer.
  • 🔥 2023.04.06: Add [subtitle file (.srt format) generation example](./demos/streaming_asr_server).
  • 🔥 2023.03.14: Add SVS(Singing Voice Synthesis) examples with Opencpop dataset, including [DiffSinger](./examples/opencpop/svs1)、[PWGAN](./examples/opencpop/voc1) and [HiFiGAN](./examples/opencpop/voc5), the effect is continuously optimized.
  • 👑 2023.03.09: Add [Wav2vec2ASR-zh](./examples/aishell/asr3).
  • 🎉 2023.03.07: Add [TTS ARM Linux C++ Demo (with C++ Chinese Text Frontend)](./demos/TTSArmLinux).
  • 🔥 2023.03.03 Add Voice Conversion [StarGANv2-VC synthesize pipeline](./examples/vctk/vc3).
  • 🎉 2023.02.16: Add [Cantonese TTS](./examples/canton/tts3).
  • 🔥 2023.01.10: Add [code-switch asr CLI and Demos](./demos/speech_recognition).
  • 👑 2023.01.06: Add [code-switch asr tal_cs recipe](./examples/tal_cs/asr1/).
  • 🎉 2022.12.02: Add [end-to-end Prosody Prediction pipeline](./examples/csmsc/tts3_rhy) (including using prosody labels in Acoustic Model).
  • 🎉 2022.11.30: Add [TTS Android Demo](./demos/TTSAndroid).
  • 🤗 2022.11.28: PP-TTS and PP-ASR demos are available in AIStudio and [official website

of paddlepaddle](https://www.paddlepaddle.org.cn/models).

Excerpt shown — open the source for the full document.