siliconflow/ComfyUI-FishAudioS2
forked from Saganaki22/ComfyUI-FishAudioS2
Captured source
source ↗siliconflow/ComfyUI-FishAudioS2
Description: ComfyUI custom nodes for Fish Audio S2-Pro TTS — voice clone, multi-speaker, and text-to-speech
License: NOASSERTION
Stars: 0
Forks: 0
Open issues: 1
Created: 2026-03-30T05:44:20Z
Pushed: 2026-03-30T06:49:22Z
Default branch: main
Fork: yes
Parent repository: Saganaki22/ComfyUI-FishAudioS2
Archived: no
README:
---
---
https://github.com/user-attachments/assets/d69377a6-1c28-40d0-a61a-ba27237e6801
---
🎵 Overview
Fish Audio S2 Pro is a state-of-the-art text-to-speech model with fine-grained inline control of prosody and emotion. Trained on 10M+ hours of audio data across 83 languages with 1500+ emotive tags, it combines reinforcement learning alignment with a Dual-Autoregressive architecture for speech that sounds natural, realistic, and emotionally rich.
Paper: Fish Audio S2 Technical Report (arXiv:2603.08823)
This ComfyUI wrapper provides native node-based integration with:
- Zero-shot voice cloning from 10-30 second reference audio
- Inline emotion/prosody control with
[tag]syntax - Multi-speaker conversation synthesis in a single pass
- Per-speaker audio isolation for multi-speaker lip sync workflows
- 83 language support with automatic detection
---
✨ Features
- Zero-Shot Voice Cloning – Clone any voice from 10-30 seconds of reference audio
- 1500+ Emotive Tags – Fine-grained control with
[laugh],[whisper],[excited],[sad], etc. - 83 Languages – Full multilingual support without phoneme preprocessing
- Multi-Speaker TTS – Generate conversations with multiple cloned voices in one pass
- Per-Speaker Audio Isolation – Separate audio tracks for each speaker (lip sync workflows)
- Native ComfyUI Integration – AUDIO noodle inputs, progress bars, interruption support
- Optimized Performance – Support for bf16/fp16/fp32 dtypes, SDPA, FlashAttention, SageAttention
- Smart Auto-Download – Model weights auto-downloaded from HuggingFace on first use
- Smart Caching – Optional model caching with automatic unloading on config change
---
Requirements
- GPU: NVIDIA GPU with 24GB+ VRAM for full model (RTX 3090/4090, A5000, etc.)
- 16GB+ VRAM works with BNB NF4 4-bit on-the-fly quantization (~10-11 it/s)
- CPU/MPS: ~1.5-2 seconds per token (experimental)
- 18GB+ VRAM works with BNB INT8 on-the-fly quantization (~10-11 it/s)
- 20GB+ VRAM works with the FP8 quantized model (
s2-pro-fp8, ~15 it/s, requires RTX 4090/5090 or Ada/Blackwell GPU) - CPU/MPS: ⚠️ EXPERIMENTAL
- Python: 3.10+
- CUDA: 11.8+ (for GPU inference)
> ⚠️ BNB On-the-Fly Quantization Requirements: > > BNB INT8 and BNB NF4 options use the s2-pro (bf16) model and quantize on-the-fly via bitsandbytes. > > Install bitsandbytes: > ``bash > pip install bitsandbytes > `` > > Note: BNB options run at ~10-11 it/s vs ~15 it/s for FP8. They work on any NVIDIA GPU without special hardware requirements.
---
Models
| Model | VRAM | Speed | Description | |-------|------|-------|-------------| | s2-pro | ~24GB | ~15-17 it/s | Full precision (4B params) — best quality, works out of the box. 15 it/s baseline, 17 it/s with SageAttention | | s2-pro-fp8 | ~20GB | ~15 it/s | FP8 weight-only quantized — recommended for 20GB+ Ada/Blackwell GPUs (RTX 4090/5090), no extra dependencies | | BNB INT8 | ~18GB | ~10-11 it/s | On-the-fly INT8 quantization via bitsandbytes — uses s2-pro model, requires bitsandbytes | | BNB NF4 | ~16GB | ~10-11 it/s | On-the-fly 4-bit NF4 quantization via bitsandbytes — uses s2-pro model, requires bitsandbytes |
Models are auto-downloaded from HuggingFace on first use:
- fishaudio/s2-pro — full model
- drbaph/s2-pro-fp8 — FP8 quantized
---
Tested Configurations
Tested and working v0.4.0 with PyTorch 2.10+cu13.
| | Standalone env | Shared ComfyUI env | FP8 (RTX 4090/5090) | |---|---|---|---| | Python | 3.10 – 3.13 | 3.10 – 3.13 | 3.10 – 3.13 | | PyTorch | 2.x + CUDA 11.8+ | managed by ComfyUI | 2.x + CUDA 11.8+ | | torchaudio | any (2.9+ supported) | any (2.9+ supported) | any (2.9+ supported) | | protobuf | any (not touched) | any (not touched) | any (not touched) | | descript-audio-codec | 1.0.0 (--no-deps) | 1.0.0 (--no-deps) | 1.0.0 (--no-deps) | | descript-audiotools | 0.7.2 (--no-deps) | 0.7.2 (--no-deps) | 0.7.2 (--no-deps) | | transformers | ≥4.45.2 | ≥4.45.2 | ≥4.45.2 | | bitsandbytes | optional (NF4/INT8) | optional (NF4/INT8) | not needed | | VRAM | 24GB+ / 16GB+ (BNB) | 24GB+ / 16GB+ (BNB) | 20GB+ (Ada/Blackwell) | | GPU | any NVIDIA | any NVIDIA | RTX 4090/5090 or Ada/Blackwell |
> As of v0.3.0, descript-audio-codec, descript-audiotools, and protobuf are never installed or modified by pip install -r requirements.txt. The two audio packages are auto-installed at first startup with --no-deps, leaving your environment's protobuf version untouched. > > As of v0.3.6, all transitive runtime dependencies of dac/audiotools (flatten-dict, importlib-resources, julius, randomname, ffmpy, argbind) are also auto-installed, fixing fresh-install failures on clean portable environments.
---
Installation
Click to expand installation methods
Method 1: ComfyUI Manager (Recommended)
1. Open ComfyUI Manager 2. Search for "FishAudioS2" 3. Click Install 4. Restart ComfyUI
Method 2: Manual Installation
cd ComfyUI/custom_nodes git clone https://github.com/saganaki22/ComfyUI-FishAudioS2.git cd ComfyUI-FishAudioS2 pip install -r requirements.txt
> Note: descript-audio-codec and descript-audiotools are not in requirements.txt on purpose — they are auto-installed by the node at ComfyUI startup with --no-deps to avoid their protobuf > If auto-install fails at startup, install them manually **with --no-deps** (omitting this flag can break other ComfyUI nodes that need protobuf 5.x): > bash > pip install descript-audio-codec --no-deps > pip install "descript-audiotools>=0.7.2" --no-deps >
> [!CAUTION] > Never run `pip install git+https://github.com/fishaudio/fish-speech` > fish-speech is already bundled inside this node. Running that command will downgrade PyTorch and other core packages, potentially breaking…
Excerpt shown — open the source for the full document.
Notability
notability 2.0/10Routine fork by same org, no novelty.