Together AI expands fine-tuning service with tool calling, reasoning, and vision support
Captured source
source ↗Together AI expands fine-tuning service with tool calling, reasoning, and vision support
⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →
Introducing Together AI's new look →
🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →
⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →
📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →
🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →
All blog posts
Fine-Tuning
Published 3/18/2026
Together AI expands fine-tuning service with tool calling, reasoning, and vision support
Authors
Conner Manuel, Nikita Smetanin, Artem Chumachenko, Ruslan Khaidurov, Maksim Abraham, Sergei Vorobyov, Gleb Khaykin, Soroush Bassam, Gleb Vazhenin, Denis Fedorenko, Zain Hasan, Sonny Khan, Will Van Eaton, Jennifer Wu, Max Ryabinin
Table of contents
40+ Models Chosen for Production...40+ Models Chosen for Production...40+ Models Chosen for Production...
Links in this article
VLM fine-tuning R easoning fine-tuning F unction calling fine-tuning Documentation Contact Sales
What’s New
Tool call fine-tuning: Ensure agents execute structured actions reliably with end-to-end fine-tuning and inference on OpenAI-compatible schema. Reasoning fine-tuning: Specialized support for training models on “thinking” tokens in reasoning traces, allowing models to learn complex logic. Vision-language model fine-tuning: Native support for vision training to align vision-language models with complex, domain-specific visual data. Large model support: train the latest models with up to 1T parameters on our highly optimized and easy-to-use service.
As AI teams move from single-turn prompting to advanced multi-turn workflows, reliability breaks in predictable places: tool calls that don’t match schemas, reasoning that degrades over long interactions, and models that miss domain-specific visual signals. Fixing those issues usually requires post-training, but the workflow is often fragmented, slow to iterate, and hard to plan. Today, Together AI, the AI Native Cloud, is expanding Together Fine-Tuning with native support for tool call, reasoning, and vision-language model (VLM) fine-tuning. To support frontier-scale post-training, we have also upgraded the training stack to handle 100B+ parameter models more efficiently, delivering up to 6× higher throughput. In addition, we now support fine-tuning on datasets of up to 100GB in size. Finally, we now provide job cost estimations before training and ETA during training, so teams can better plan their experiments. "Together AI does for fine-tuning and inference what Vercel does for LLM-based apps — it removes the infrastructure layer so we can focus on our product. We fine‑tune and deploy customer‑specific models through simple API calls. That lets our existing team move from weekly to daily iteration, cut costs by 2–3×, and improve accuracy from 77% to 87%." — Lamara De Brouwer, Co-Founder & CTO, XY.AI Labs Tool Call Fine-tuning Tool calling is essential to many modern agentic use cases. Yet, out-of-the-box models often struggle with tool calling: hallucinating parameters, selecting incorrect functions, or failing to follow multi-step sequences. In tool calling workflows, even small inconsistencies can cascade into downstream failures. Our fine-tuning service now delivers an end-to-end solution for reliable, production-grade tool calling, spanning fine-tuning through inference. Tool calls can be included in training data using the OpenAI-compatible schema. Functions are defined in a top-level tools array, and our service validates that every tool_calls entry matches a declared tool, ensuring structurally correct data before training begins. At inference time, we’ve significantly improved tool call reliability to ensure the benefits of tool call fine-tuning translate into production performance. Enhanced parsing and validation improve correctness across a wide range of real-world use cases, supported by inference tool calling datasets curated from both community contributions and internal research. Tool-call fine-tuning is available for models from Qwen, Moonshot AI, and Z.AI. See the tool calling documentation to get started. To see an example of tool call functionality in code, take a look at our cookbook . Reasoning Fine-tuning Reasoning models generate intermediate thinking traces before producing a final answer, enabling step-by-step reasoning. However, reasoning formats are not standardized across models, introducing complexity into the reasoning fine-tuning process. Together Fine-tuning now supports fine-tuning directly on thinking traces using a reasoning or reasoning_content field in assistant messages. This lets you train models on domain-specific reasoning patterns while keeping traces structured and reproducible. As with tool calling, we have improved reasoning inference to ensure that fine-tuned capabilities translate into reliable downstream performance. Reasoning fine-tuning is available for models from Qwen and Z.AI. See our documentation page for supported models and details. For an end-to-end code demo of reasoning fine-tuning, check out our cookbook Vision-Language Model Fine-tuning Many AI workflows require models that can interpret image inputs. For domain-specific tasks like medical imaging and eCommerce, vision-language models (VLMs) may need to learn new visual patterns to be effective. Together Fine-tuning service now supports fine-tuning of vision-language models. Vision training data is provided inline using message content arrays with base64 encoded images. Fine-tuning jobs support hybrid datasets, allowing both image-text examples and text-only examples within the same run. By default, we freeze the vision encoder and update only the language layers. Setting train_vision=true enables joint training, allowing updates to both the vision encoder and language layers. VLM fine-tuning is available for models from Qwen, Google, and Meta. See the vision-language documentation for the supported list and usage details. You can also check out our cookbook for vision language fine-tuning here . Large Model Fine-tuning As the sizes of open models grow and context windows expand, the underlying training infrastructure has to keep pace. Trillion-parameter models cannot fit on a single node, thus needing careful communication and memory management across multiple machines. Even a…
Excerpt shown — open the source for the full document.
Notability
notability 7.0/10Notable service expansion with key AI capabilities.