What does this repo signal mean?

StepFun published stepfun-ai/Step-Audio-EditX (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo stepfun-ai/Step-Audio-EditX · language Python · Solid new repo with 926 GitHub stars. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

StepFun Repo: stepfun-ai/Step-Audio-EditX

Captured source

source ↗

GitHub/github.com/stepfun-ai/Step-Audio-EditX

stepfun-ai/Step-Audio-EditX repository metadata

Source ↗

published Oct 29, 2025seen Jun 5captured Jun 11http 200method plain

stepfun-ai/Step-Audio-EditX

Description: A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Language: Python

License: Apache-2.0

Stars: 929

Forks: 69

Open issues: 37

Created: 2025-10-29T11:54:17Z

Pushed: 2026-04-09T02:27:46Z

Default branch: main

Fork: no

Archived: no

README:

Step-Audio-EditX

🔥🔥🔥 News!!！

Jan 29, 2026:
🧩 New Model Release:
Better performance, with an overall improvement of over 4%.
More paralinguistic tags have been added, including `exhale`, `snort`, `inhale`, `chuckle`, `clears throat`, `giggle`.
Welcome to try out at StepFun Audio Studio
💻 We release the SFT, DPO and GRPO training code.
🌟 Training and inference for vLLM are now supported. Thanks to the vLLM team!
Nov 28, 2025: 🚀 New Model Release: Now supporting `Japanese` and `Korean` languages.
Nov 23, 2025: 📊 Step-Audio-Edit-Benchmark Released!
Nov 19, 2025: ⚙️ We release a new version of our model, which supports polyphonic pronunciation control and improves the performance of emotion, speaking style, and paralinguistic editing.
Nov 12, 2025: 📦 We release the optimized inference code and model weights of Step-Audio-EditX (HuggingFace; ModelScope) and Step-Audio-Tokenizer(HuggingFace; ModelScope)
Nov 07, 2025: ✨ Demo Page ; 🎮 HF Space Playground
Nov 06, 2025: 👋 We release the technical report of Step-Audio-EditX.

Introduction

We are open-sourcing Step-Audio-EditX, a powerful 3B-parameter LLM-based Reinforcement Learning audio model specialized in expressive and iterative audio editing. It excels at editing emotion, speaking style, and paralinguistics, and also features robust zero-shot text-to-speech (TTS) capabilities.

Wechat developer group

📑 Open-source Plan

[x] Inference Code
[x] Online demo (Gradio)
[x] Step-Audio-Edit-Benchmark
[x] Model Checkpoints
[x] Step-Audio-Tokenizer
[x] Step-Audio-EditX
[x] Step-Audio-EditX-Int4
[ ] Training Code
[x] SFT training
[x] DPO training
[x] GRPO training
[ ] PPO training
[ ] ⏳ Feature Support Plan
[ ] Editing
[x] Polyphone pronunciation control
[x] More paralinguistic tags ([Cough, Crying, Stress, etc.])
[ ] Filler word removal
[ ] Other Languages
[x] Japanese, Korean
[ ] Arabic, French, Russian, Spanish, etc.

Features

Zero-Shot TTS
Excellent zero-shot TTS cloning for Mandarin, English, Sichuanese, and Cantonese.
To use dialect or other languages, just add a `[Sichuanese]` / `[Cantonese]` / `[Japanese]` / `[Korean]` tag before your text.
🔥 Polyphone pronunciation control, all you need to do is replace the polyphonic characters with pinyin.
[我也想过过过儿过过的生活] -> [我也想guo4guo4guo1儿guo4guo4的生活]

Emotion and Speaking Style Editing
Remarkably effective iterative control over emotions and styles, supporting dozens of options for editing.
Emotion Editing : [ *Angry*, *Happy*, *Sad*, *Excited*, *Fearful*, *Surprised*, *Disgusted*, etc. ]
Speaking Style Editing: [ *Act_coy*, *Older*, *Child*, *Whisper*, *Serious*, *Generous*, *Exaggerated*, etc.]
Editing with more emotion and more speaking styles is on the way. Get Ready! 🚀

Paralinguistic Editing
Precise control over 10 types of paralinguistic features for more natural, human-like, and expressive synthetic audio.
Supporting Tags:
[ *Breathing*, *Laughter*, *Surprise-oh*, *Confirmation-en*, *Uhm*, *Surprise-ah*, *Surprise-wa*, *Sigh*, *Question-ei*, *Dissatisfaction-hnn* ]

Available Tags

emotion happy Expressing happiness angry Expressing anger

sad Expressing sadness fear Expressing fear

surprised Expressing surprise confusion Expressing confusion

empathy Expressing empathy and understanding embarrass Expressing embarrassment

excited Expressing excitement and enthusiasm depressed Expressing a depressed or discouraged mood

admiration Expressing admiration or respect coldness Expressing coldness and indifference

disgusted Expressing disgust or aversion humour Expressing humor or playfulness

speaking style serious Speaking in a serious or solemn manner arrogant Speaking in an arrogant manner

child Speaking in a childlike manner older Speaking in an elderly-sounding manner

girl Speaking in a light, youthful feminine manner pure Speaking in a pure, innocent manner

sister Speaking in a mature, confident feminine manner sweet Speaking in a sweet, lovely manner

exaggerated Speaking in an exaggerated, dramatic manner ethereal Speaking in a soft, airy, dreamy manner

whisper Speaking in a whispering, very soft manner generous Speaking in a hearty, outgoing, and straight-talking manner

recite Speaking in a clear, well-paced, poetry-reading manner act_coy Speaking in a sweet, playful, and endearing manner

warm Speaking in a warm, friendly manner shy Speaking in a shy, timid manner

comfort Speaking in a comforting, reassuring manner authority Speaking in an authoritative, commanding manner

chat Speaking in a casual, conversational manner radio Speaking in a radio-broadcast manner

soulful Speaking in a heartfelt, deeply emotional manner gentle Speaking in a gentle, soft manner

story Speaking in a narrative, audiobook-style manner vivid Speaking in a lively, expressive manner

program Speaking in a show-host/presenter manner news Speaking in a news broadcasting manner

advertising Speaking in a polished, high-end commercial voiceover manner roar Speaking in a loud, deep, roaring manner

murmur Speaking in a quiet, low manner shout Speaking in a loud, sharp, shouting manner

deeply Speaking in a deep and low-pitched tone loudly Speaking in a loud and high-pitched tone

paralinguistic [sigh] Sighing sound [inhale] Inhaling sound

[laugh] Laughter sound [chuckle] Chuckling sound

[exhale] Exhaling sound [clears...

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

Solid new repo with 926 GitHub stars