togethercomputer/Kokoro-FastAPI
forked from remsky/Kokoro-FastAPI
Captured source
source ↗togethercomputer/Kokoro-FastAPI
Description: Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/CPU ONNX and NVIDIA GPU PyTorch support, handling, and auto-stitching
Language: Python
License: Apache-2.0
Stars: 0
Forks: 0
Open issues: 0
Created: 2025-06-26T10:43:56Z
Pushed: 2025-07-06T04:18:29Z
Default branch: master
Fork: yes
Parent repository: remsky/Kokoro-FastAPI
Archived: no
README:
_FastKoko_
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
- Multi-language support (English, Japanese, Chinese, _Vietnamese soon_)
- OpenAI-compatible Speech endpoint, NVIDIA GPU accelerated or CPU inference with PyTorch
- ONNX support coming soon, see v0.1.5 and earlier for legacy ONNX support in the interim
- Debug endpoints for monitoring system stats, integrated web UI on localhost:8880/web
- Phoneme-based audio generation, phoneme generation
- Per-word timestamped caption generation
- Voice mixing with weighted combinations
Integration Guides
Get Started
Quickest Start (docker run)
Pre built images are available to run, with arm/multi-arch support, and baked in models Refer to the core/config.py file for a full list of variables which can be managed via the environment
# the `latest` tag can be used, though it may have some unexpected bonus features which impact stability. Named versions should be pinned for your regular usage. Feedback/testing is always welcome docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest # CPU, or: docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:latest #NVIDIA GPU
Quick Start (docker compose)
1. Install prerequisites, and start the service using Docker Compose (Full setup including UI):
- Install Docker
- Clone the repository:
git clone https://github.com/remsky/Kokoro-FastAPI.git cd Kokoro-FastAPI cd docker/gpu # For GPU support # or cd docker/cpu # For CPU support docker compose up --build # *Note for Apple Silicon (M1/M2) users: # The current GPU build relies on CUDA, which is not supported on Apple Silicon. # If you are on an M1/M2/M3 Mac, please use the `docker/cpu` setup. # MPS (Apple's GPU acceleration) support is planned but not yet available. # Models will auto-download, but if needed you can manually download: python docker/scripts/download_model.py --output api/src/models/v1_0 # Or run directly via UV: ./start-gpu.sh # For GPU support ./start-cpu.sh # For CPU support
Direct Run (via uv)
1. Install prerequisites ():
- Install astral-uv
- Install espeak-ng in your system if you want it available as a fallback for unknown words/sounds. The upstream libraries may attempt to handle this, but results have varied.
- Clone the repository:
git clone https://github.com/remsky/Kokoro-FastAPI.git cd Kokoro-FastAPI
Run the model download script if you haven't already
Start directly via UV (with hot-reload)
Linux and macOS
./start-cpu.sh OR ./start-gpu.sh
Windows
.\start-cpu.ps1 OR .\start-gpu.ps1
Up and Running?
Run locally as an OpenAI-Compatible Speech Endpoint
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8880/v1", api_key="not-needed"
)
with client.audio.speech.with_streaming_response.create(
model="kokoro",
voice="af_sky+af_bella", #single or multiple voicepack combo
input="Hello world!"
) as response:
response.stream_to_file("output.mp3")- The API will be available at http://localhost:8880
- API Documentation: http://localhost:8880/docs
- Web Interface: http://localhost:8880/web
Features
OpenAI-Compatible Speech Endpoint
# Using OpenAI's Python library
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8880/v1", api_key="not-needed")
response = client.audio.speech.create(
model="kokoro",
voice="af_bella+af_sky", # see /api/src/core/openai_mappings.json to customize
input="Hello world!",
response_format="mp3"
)
response.stream_to_file("output.mp3")Or Via Requests:
import requests
response = requests.get("http://localhost:8880/v1/audio/voices")
voices = response.json()["voices"]
# Generate audio
response = requests.post(
"http://localhost:8880/v1/audio/speech",
json={
"model": "kokoro",
"input": "Hello world!",
"voice": "af_bella",
"response_format": "mp3", # Supported: mp3, wav, opus, flac
"speed": 1.0
}
)
# Save audio
with open("output.mp3", "wb") as f:
f.write(response.content)Quick tests (run from another terminal):
python examples/assorted_checks/test_openai/test_openai_tts.py # Test OpenAI Compatibility python examples/assorted_checks/test_voices/test_all_voices.py # Test all available voices
Voice Combination
- Weighted voice combinations using ratios (e.g., "af_bella(2)+af_heart(1)" for 67%/33% mix)
- Ratios are automatically normalized to sum to 100%
- Available through any endpoint by adding weights in parentheses
- Saves generated voicepacks for future use
Combine voices and generate audio:
import requests
response = requests.get("http://localhost:8880/v1/audio/voices")
voices = response.json()["voices"]
# Example 1: Simple voice combination (50%/50% mix)
response = requests.post(
"http://localhost:8880/v1/audio/speech",
json={
"input": "Hello world!",
"voice": "af_bella+af_sky", # Equal weights
"response_format": "mp3"
}
)
# Example 2: Weighted voice combination (67%/33% mix)
response = requests.post(
"http://localhost:8880/v1/audio/speech",
json={
"input": "Hello world!",
"voice": "af_bella(2)+af_sky(1)", # 2:1 ratio = 67%/33%
"response_format": "mp3"
}
)
# Example 3: Download combined voice as .pt file
response = requests.post(
"http://localhost:8880/v1/audio/voices/combine",
json="af_bella(2)+af_sky(1)" # 2:1 ratio = 67%/33%
)
# Save the .pt file
with open("combined_voice.pt", "wb") as f:
f.write(response.content)
# Use the downloaded voice file
response = requests.post(
"http://localhost:8880/v1/audio/speech",
json={
"input": "Hello world!",
"voice": "combined_voice", # Use the saved voice file
"response_format": "mp3"
}
)Multiple Output Audio Formats
- mp3
- wav
- opus
- flac
- m4a
- pcm
Streaming Support
# OpenAI-compatible streaming from openai import…
Excerpt shown — open the source for the full document.
Notability
notability 2.0/10Routine fork of existing repo