WritingTogether AITogether AIpublished Dec 23, 2025seen 5d

MiniMax Speech 2.6 Turbo now available natively on Together AI

Open original ↗

Captured source

source ↗
published Dec 23, 2025seen 5dcaptured 3dhttp 200method plain

MiniMax Speech 2.6 Turbo now available natively on Together AI

⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →

Introducing Together AI's new look →

🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →

⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →

📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →

🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →

All blog posts

Model Library

Published 12/23/2025

MiniMax Speech 2.6 Turbo now available natively on Together AI

State-of-the-art multilingual TTS with human-level, emotionally aware voices in 40+ languages and real-time latency on dedicated, production-grade infrastructure.

Authors

Arielle Fidel, Rajas Bansal, Sahil Yadav, Rishabh Bhargava, Sonny Khan

Table of contents

40+ Models Chosen for Production...40+ Models Chosen for Production...40+ Models Chosen for Production...

Links in this article

MiniMax Speech 2.6 Turbo TTS Documentation ‍ X Discord ‍ We're hiring! ‍ Get Notified

Summary

MiniMax Speech 2.6 Turbo on Together AI: Top-ranked on Artificial Analysis Arena, available on dedicated infrastructure only on Together AI Sub-250ms latency, 40-plus languages with streaming inline switching, 10-second voice cloning, automatic emotional awareness Expands elite proprietary TTS models on Together AI alongside Cartesia and Rime models Dedicated GPU endpoints co-located with LLM and STT workloads

Building a real time voice agent usually forces an ugly choice: ship a voice that sounds convincingly human, or ship a voice that responds instantly and holds up in production. Most teams split the difference with a patchwork of providers: one for showcase experiences, another for low latency turns, and others for cloning or global language coverage. Over time that patchwork becomes the product. Behavior diverges by market, latency and quality drift, and "upgrade the voice" turns into a cross vendor infrastructure project instead of a product decision. Starting today, Together AI, the AI Native Cloud, is the only platform where you can run MiniMax Speech 2.6 Turbo on dedicated infrastructure alongside your LLM and STT workloads, so naturalness and speed live on one platform instead of being traded off across vendors. MiniMax Speech 2.6 Turbo is benchmarked at the top of public TTS leaderboards, built by the team behind Talkie (150 million users with 90+ minute average sessions), and trained for real conversational interaction rather than read-aloud narration. Requests run on Together AI infrastructure with zero data retention, SOC 2 Type II and HIPAA support, and data residency options. You get a single production surface for streaming delivery, capacity, and debugging with one API, one auth, and unified metrics, so conversational latency becomes an infrastructure guarantee rather than an integration tax.

MiniMax multilingual

English to Japanese to Spanish streaming language switching

Play

Pause

0:00

2:54

"Welcome to our service. Our AI seamlessly bridges the gap between cultures in real-time. 日本語でもサポートできます。言葉の壁を越えて、世界中の人々と自然につながることができます。También ofrecemos soporte en español. Porque creemos que la comunicación global debe ser así de simple."

Try now

Why naturalness drives engagement MiniMax Speech 2.6 Turbo ranks at the top of Artificial Analysis Arena in blind human evaluation. The model is trained on Talkie conversation data, where 150 million users chose to engage with AI voice for sessions averaging more than 90 minutes. Instead of learning from audiobook and podcast narration, MiniMax Speech 2.6 Turbo learned from real dialogue, which produces different prosody, pacing, and emotional range. Teams building AI native voice products choose models where voice quality directly drives completion rates. A customer service agent can have correct intent recognition and strong LLM reasoning, but synthetic delivery still causes users to drop. MiniMax Speech 2.6 Turbo is now available on Together AI with performance isolation and reliability tuned for production workloads at scale. Technical capabilities 40-plus languages with streaming inline switching Native-quality speech across major global languages with streaming inline language switching. English, Japanese, Spanish, Mandarin, French, German mid-sentence with authentic accents. The model detects language boundaries and switches with native pronunciation in real time. Automatic emotional awareness The model analyzes semantic context and adapts prosody. When your LLM outputs apologetic language, MiniMax adjusts to empathetic delivery. Upbeat greetings sound upbeat. Serious warnings sound serious. This happens automatically across all 40-plus languages without prompt engineering or markup.

MiniMax emotional awareness

Same phrase in empathetic, upbeat, and serious tones

Play

Pause

0:00

2:54

"Empathetic: "I understand. I'm sorry to hear you're experiencing this issue." Upbeat: "I understand! Great question, let me help with that." Serious: "I understand. This is a critical security matter.""

Try now

10-second voice cloning Clone a voice from a 10-second audio sample. That voice speaks 40-plus languages with native accents. The model handles imperfect recordings—background noise, accent, disfluency—and produces fluent output while preserving unique timbre. Create a branded voice for your application and deploy it globally through Together AI. Professional voice cloning services available through Sales .

MiniMax voice original

10-second original sample

Play

Pause

0:00

2:54

"A specific, you know, a specific piece of information or some event or something on their website or something that they know, hey, when they have this information, they have a much higher propensity to need our products."

MiniMax voice cloning

Multilingual output generated using a samples

Play

Pause

0:00

2:54

"Now, I am speaking with that exact same voice, created from just ten seconds of audio. 甚至可以用这个声音说中文,音色和说话习惯都完美保留了下来。 Et maintenant, écoutez ma voix en français. Remarquez la fluidité de la prononciation, qui reste fidèle à mon timbre original."

Try now

Sub-250ms latency on Together AI infrastructure MiniMax achieves sub-250ms latency on Together AI dedicated endpoints. When TTS runs alongside LLM and STT workloads on the same infrastructure, you eliminate cross-vendor network overhead. The complete pipeline from speech…

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

Notable speech model release on a platform