What does this writing signal mean?

Together AI published Rime Arcana V3 Turbo and Rime Arcana V3 now available on Together AI. This talking signal gives public context for research themes, product direction, policy, or launch framing. High-signal details: New model release on Together AI platform · Rime Arcana V3 Turbo and Rime Arcana V3 now available on Together AI ⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell → Introducing Together.... onlylabs links this event to 1 captured evidence page and 6 related writing signals.

Together AI Writing: Rime Arcana V3 Turbo and Rime Arcana V3 now available on Together AI

Captured source

source ↗

together.ai/together.ai/blog/rime-arcana-v3-turbo-and-rime-arcana-v3-now-available-on-together-ai

Rime Arcana V3 Turbo and Rime Arcana V3 now available on Together AI

Source ↗

published Feb 4, 2026seen Jun 5captured Jun 7http 200method plain

Rime Arcana V3 Turbo and Rime Arcana V3 now available on Together AI

⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →

Introducing Together AI's new look →

🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →

⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →

📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →

🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →

All blog posts

Model Library

Published 2/4/2026

Rime Arcana V3 Turbo and Rime Arcana V3 now available on Together AI

High-performance multilingual TTS with native code-switching and real-time latency on dedicated endpoints.

Authors

Sahil Yadav, Arielle Fidel, Rajas Bansal, Rishabh Bhargava, Sonny Khan

Table of contents

40+ Models Chosen for Production...40+ Models Chosen for Production...40+ Models Chosen for Production...

Links in this article

Arcana v3 ‍ Arcana v3 Turbo ‍ TTS Documentation ‍ X Discord ‍ We're hiring! ‍ Get Notified

‍

Summary

Starting today, two new Rime models are available on Together AI: Rime Arcana V3 Turbo (English–Spanish, performance) and Rime Arcana V3 (11-language switching) Native code-switching that keeps cadence and prosody consistent across language boundaries Rime Arcana V3 Turbo: ~120 ms time-to-first-audio on Together AI dedicated endpoints Co-located with LLM and STT workloads, with one API and unified observability

When a caller code-switches mid-sentence, most voice agents lose what makes them sound native. Cadence slips, the response lands like a translation, and trust drops. Teams patch it by routing between language-specific TTS models, but the handoff adds latency and makes voice behavior inconsistent inside the same conversation. Rime's Arcana V3 line is built for that moment: natural code-switching at production speed without turning multilingual into a routing problem. Starting today, Together AI, the AI Native Cloud, is adding Rime Arcana V3 Turbo and Rime Arcana V3 to the Together Model Library. V3 Turbo delivers English–Spanish code-switching at ~120 ms time-to-first-audio on dedicated endpoints, with prosody trained on bilingual speech patterns. V3 expands switching across 11 languages from a single model. Both run co-located with your LLM and STT workloads behind the same API, authentication, and observability surface you already use.

hi_thanks_for_calling_customer_support_en_de_fr_ja.wav

Play

Pause

0:00

Hi — thanks for calling customer support. I can help you in multiple languages. (English, German, French, Japanese)

Try now

V3 Turbo: Performance for real-time bilingual conversations ~120ms time-to-first-audio Voice agents need end-to-end latency under 700ms to feel conversational, which means TTS must leave headroom for STT and LLM processing. V3 Turbo hits ~120ms time-to-first-audio on Together AI dedicated endpoints, so when a customer switches from English to Spanish mid-sentence, the agent's bilingual response arrives in stride. Co-locating V3 Turbo with LLM and STT on Together AI keeps the full pipeline (speech recognition through reasoning to synthesis) within that 700ms budget. English-Spanish code-switching trained on native bilingual speech Bilingual callers mix languages inside a sentence. V3 Turbo is trained on those patterns, including where pauses land and how stress shifts at the boundary. A customer says, "I need help with my account, es que no puedo acceder." V3 Turbo can respond in the same mixed register, with pauses and emphasis that match how bilingual speakers actually talk. Efficient concurrency for high-volume deployments V3 Turbo's performance enables higher concurrency per GPU. For contact centers handling thousands of concurrent calls in bilingual markets, this means fewer GPUs to maintain production latency when customers code-switch, reducing total cost of ownership while preserving conversational quality. V3: Multilingual breadth with code-switching ~160ms time-to-first-audio across 11 languages V3 reaches ~160ms p50 time-to-first-audio on Together AI dedicated endpoints while supporting code-switching across 11 languages. This keeps multilingual conversations responsive even as the model handles the complexity of natural transitions between any supported language pair. 11 languages with natural transitions V3 supports 11 languages and can code-switch between supported languages. A customer starts in French, switches to English for a technical term, then back to French for clarification. V3 handles these transitions while preserving prosody and accent consistency. Single model for multilingual markets V3 lets teams consolidate what used to require separate models or vendors per language. Deploy once and serve multilingual customers from a single endpoint without maintaining separate infrastructure per market. When the conversation switches languages, V3 keeps cadence and emphasis natural so the transition does not sound stitched together.

hi_nice_to_meet_you_en_es_fr_de_pt_ar_he_hi_ja_ta.wav

Play

Pause

0:00

Hi, nice to meet you! (English, Spanish, French, German, Portuguese, Arabic, Hebrew, Hindi, Japanese, Tamil).

Try now

Use cases Bilingual metro markets In bilingual metro markets, customer service calls routinely involve code-switching. Customers start in English, switch to Spanish for culturally specific context, switch back for confirmation. V3 Turbo handles these transitions at ~120ms time-to-first-audio, so customers stay in the automated flow longer instead of requesting transfer to human agents. Together AI dedicated endpoints keep performance consistent even during peak call volume. Regulated services in bilingual contexts Banks, healthcare providers, and government services serving bilingual communities need agents that code-switch the way their customers do. A customer calling about a prescription might use English for most of the conversation, but switch to their native language for symptoms or medication names. Natural switching reduces repeats and transfers because callers stop testing the agent's language ability mid-call. Running your full voice stack on Together AI means one compliance review covers LLM, STT, and TTS. International call centers Call centers serving multilingual markets handle customers who code-switch across multiple languages in a single call. A business customer in Luxembourg might mix French, German, and English in one...

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

New model release on Together AI platform