WritingTogether AITogether AIpublished Sep 10, 2025seen 5d

Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts, Enhanced Hugging Face Integrations

Open original ↗

Captured source

source ↗

Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts, Enhanced Hugging Face Integrations

⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →

Introducing Together AI's new look →

🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →

⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →

📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →

🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →

All blog posts

Fine-Tuning

Published 9/10/2025

Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts, Enhanced Hugging Face Integrations

Authors

Artem Chumachenko, Maksim Abraham, Soroush Bassam, Gleb Vazhenin, Egor Timofeev, Conner Manuel, Zain Hasan, Will Van Eaton, Max Ryabinin

Table of contents

40+ Models Chosen for Production...40+ Models Chosen for Production...40+ Models Chosen for Production...

Links in this article

OpenAI’s gpt-oss fine-tuning Slingshot AI Case Study Parsed gets 60% better accuracy with fine-tuning

Model customization is an extremely versatile tool that comes in handy for many kinds of AI developers. For instance, you can make the strongest open LLMs even better on business-critical tasks by fine-tuning them on domain-specific data. Moreover, it's possible to drastically reduce both inference costs and latency via training smaller but equally capable models. Our goal with the Together Fine-Tuning Platform is to streamline the process of model training for AI developers, helping them quickly build the best models for their applications by offering convenient and affordable tools. This release showcases a new package of improvements, drastically expanding the scope of what you can train: from the native support for over a dozen latest LLMs to new DPO options and better integrations with the Hugging Face Hub. Learn more about the new features in this blog post! Large models at Together AI In 2025, we have seen a great number of models with over 100B parameters released to the public. These models, such as DeepSeek-R1, Qwen3-235B, or Llama 4 Maverick, offer a dramatic jump in capabilities, sometimes rivaling even the strongest proprietary models on certain tasks. With fine-tuning, you can further refine the abilities of these models, steering them towards the precise behavior you need or showing how to solve complex tasks by providing SFT demonstrations. However, training large models is a challenging matter: even orchestrating multi-node jobs reliably can be non-trivial, and running them efficiently requires huge time investments across the stack. Now, you can train the latest large models on the Together Fine-Tuning Platform! By implementing the latest training optimizations and carefully engineering our platform, we made it possible to easily train models with hundreds of billions of weights at a low cost. We have recently announced the general availability of OpenAI's gpt-oss fine-tuning on our platform, and now we support even more model families, covering recent releases by DeepSeek, Qwen, and Meta. The full list of new large models is as follows: openai/gpt-oss-120b deepseek-ai/DeepSeek-V3.1 deepseek-ai/DeepSeek-V3.1-Base deepseek-ai/DeepSeek-R1-0528 deepseek-ai/DeepSeek-R1 deepseek-ai/DeepSeek-V3-0324 deepseek-ai/DeepSeek-V3 deepseek-ai/DeepSeek-V3-Base Qwen/Qwen3-Coder-480B-A35B-Instruct Qwen/Qwen3-235B-A22B (context length 32768 for SFT and 16384 for DPO) Qwen/Qwen3-235B-A22B-Instruct-2507 (context length 32768 for SFT and 16384 for DPO) meta-llama/Llama-4-Maverick-17B-128E meta-llama/Llama-4-Maverick-17B-128E-Instruct meta-llama/Llama-4-Scout-17B-16E meta-llama/Llama-4-Scout-17B-16E-Instruct

Unless stated otherwise, we support a context length of 16,384 tokens for SFT and 8,192 tokens for DPO training. Once the training run finishes, you can start a Dedicated Endpoint to run inference for these models, as well as download their final or intermediate checkpoints. See the complete list of models supported for fine-tuning in our docs , and check out the pricing page for details about the cost of fine-tuning 100B+ parameter models. Context length extensions With recent progress on tasks such as long-document processing, editing of large codebases, and agentic interaction chains, reliable handling of long contexts is as important as ever. Ideally, you want these long examples to be present in your training data, as this eliminates the test-time domain mismatch, boosting the results on the target task. Given this trend, we wanted to make it possible for AI developers to harness long-context abilities in fine-tuning. To make this happen, we have overhauled our training systems and identified ways to increase the maximum supported context length for most of our models — at no additional cost to you . On average, you can expect 2x-4x increases to the context length, with some settings (like Llama 3.1-8B or Gemma 3-4B) jumping to their maximum length of 131k tokens. See the picture below for the example context length increases:

Slingshot AI , the company behind the AI therapy app Ash, built a foundation model for psychology and fine-tuned it with long-context clinical conversations. For their use case, long-context fine-tuning was essential to capture the full scope of these conversations. "The technical challenge was running our multi-stage pipeline reliably at the conversation lengths our therapy models require," explains Daniel Cahn. "Together AI's platform eliminated the context length constraints and job failures we hit elsewhere, letting us experiment rapidly." - Daniel Cahn, Co-founder & CEO, Slingshot AI For some of the larger models like Llama-3.3-70B , we also offer a separate option of full-context fine-tuning. See the complete list of such models, as well as the context lengths we support, in the docs . Our work here is far from done: as we discover and implement additional optimizations of our training systems, we will push for larger context lengths (even for 100B+ models) while aiming to keep the runtime and costs of training low. If you need long-context training for a model that is currently missing or need to further increase the context length, we would love to learn more about your use case and support it! Fine-tune your own model, upload to HF Hub Given the tempo of acceleration in AI nowadays, you can often see increasingly stronger models trained for specific tasks and released nearly…

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

Notable platform upgrade with community relevance.