Rime Arcana V3 Turbo and Rime Arcana V3 now available on Together AI
Captured source
source ↗Rime Arcana V3 Turbo and Rime Arcana V3 now available on Together AI
⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →
Introducing Together AI's new look →
🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →
⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →
📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →
🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →
All blog posts
Model Library
Published 2/4/2026
Rime Arcana V3 Turbo and Rime Arcana V3 now available on Together AI
High-performance multilingual TTS with native code-switching and real-time latency on dedicated endpoints.
Authors
Sahil Yadav, Arielle Fidel, Rajas Bansal, Rishabh Bhargava, Sonny Khan
Table of contents
40+ Models Chosen for Production...40+ Models Chosen for Production...40+ Models Chosen for Production...
Links in this article
Arcana v3 Arcana v3 Turbo TTS Documentation X Discord We're hiring! Get Notified
Summary
Starting today, two new Rime models are available on Together AI: Rime Arcana V3 Turbo (English–Spanish, performance) and Rime Arcana V3 (11-language switching) Native code-switching that keeps cadence and prosody consistent across language boundaries Rime Arcana V3 Turbo: ~120 ms time-to-first-audio on Together AI dedicated endpoints Co-located with LLM and STT workloads, with one API and unified observability
When a caller code-switches mid-sentence, most voice agents lose what makes them sound native. Cadence slips, the response lands like a translation, and trust drops. Teams patch it by routing between language-specific TTS models, but the handoff adds latency and makes voice behavior inconsistent inside the same conversation. Rime's Arcana V3 line is built for that moment: natural code-switching at production speed without turning multilingual into a routing problem. Starting today, Together AI, the AI Native Cloud, is adding Rime Arcana V3 Turbo and Rime Arcana V3 to the Together Model Library. V3 Turbo delivers English–Spanish code-switching at ~120 ms time-to-first-audio on dedicated endpoints, with prosody trained on bilingual speech patterns. V3 expands switching across 11 languages from a single model. Both run co-located with your LLM and STT workloads behind the same API, authentication, and observability surface you already use.
hi_thanks_for_calling_customer_support_en_de_fr_ja.wav
Play
Pause
0:00
0:00
Hi — thanks for calling customer support. I can help you in multiple languages. (English, German, French, Japanese)
Try now
V3 Turbo: Performance for real-time bilingual conversations ~120ms time-to-first-audio Voice agents need end-to-end latency under 700ms to feel conversational, which means TTS must leave headroom for STT and LLM processing. V3 Turbo hits ~120ms time-to-first-audio on Together AI dedicated endpoints, so when a customer switches from English to Spanish mid-sentence, the agent's bilingual response arrives in stride. Co-locating V3 Turbo with LLM and STT on Together AI keeps the full pipeline (speech recognition through reasoning to synthesis) within that 700ms budget. English-Spanish code-switching trained on native bilingual speech Bilingual callers mix languages inside a sentence. V3 Turbo is trained on those patterns, including where pauses land and how stress shifts at the boundary. A customer says, "I need help with my account, es que no puedo acceder." V3 Turbo can respond in the same mixed register, with pauses and emphasis that match how bilingual speakers actually talk. Efficient concurrency for high-volume deployments V3 Turbo's performance enables higher concurrency per GPU. For contact centers handling thousands of concurrent calls in bilingual markets, this means fewer GPUs to maintain production latency when customers code-switch, reducing total cost of ownership while preserving conversational quality. V3: Multilingual breadth with code-switching ~160ms time-to-first-audio across 11 languages V3 reaches ~160ms p50 time-to-first-audio on Together AI dedicated endpoints while supporting code-switching across 11 languages. This keeps multilingual conversations responsive even as the model handles the complexity of natural transitions between any supported language pair. 11 languages with natural transitions V3 supports 11 languages and can code-switch between supported languages. A customer starts in French, switches to English for a technical term, then back to French for clarification. V3 handles these transitions while preserving prosody and accent consistency. Single model for multilingual markets V3 lets teams consolidate what used to require separate models or vendors per language. Deploy once and serve multilingual customers from a single endpoint without maintaining separate infrastructure per market. When the conversation switches languages, V3 keeps cadence and emphasis natural so the transition does not sound stitched together.
hi_nice_to_meet_you_en_es_fr_de_pt_ar_he_hi_ja_ta.wav
Play
Pause
0:00
0:00
Hi, nice to meet you! (English, Spanish, French, German, Portuguese, Arabic, Hebrew, Hindi, Japanese, Tamil).
Try now
Use cases Bilingual metro markets In bilingual metro markets, customer service calls routinely involve code-switching. Customers start in English, switch to Spanish for culturally specific context, switch back for confirmation. V3 Turbo handles these transitions at ~120ms time-to-first-audio, so customers stay in the automated flow longer instead of requesting transfer to human agents. Together AI dedicated endpoints keep performance consistent even during peak call volume. Regulated services in bilingual contexts Banks, healthcare providers, and government services serving bilingual communities need agents that code-switch the way their customers do. A customer calling about a prescription might use English for most of the conversation, but switch to their native language for symptoms or medication names. Natural switching reduces repeats and transfers because callers stop testing the agent's language ability mid-call. Running your full voice stack on Together AI means one compliance review covers LLM, STT, and TTS. International call centers Call centers serving multilingual markets handle customers who code-switch across multiple languages in a single call. A business customer in Luxembourg might mix French, German, and English in one…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10New model release on Together AI platform