WritingTogether AITogether AIpublished Aug 21, 2025seen 5d

How Together AI Uses AI Agents to Automate Complex Engineering Tasks: Lessons from Developing Efficient LLM Inference Systems

Open original ↗

Captured source

source ↗

How Together AI Uses AI Agents to Automate Complex Engineering Tasks: Lessons from Developing Efficient LLM Inference Systems

⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →

Introducing Together AI's new look →

🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →

⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →

📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →

🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →

All blog posts

Research

Published 8/21/2025

How Together AI Uses AI Agents to Automate Complex Engineering Tasks: Lessons from Developing Efficient LLM Inference Systems

Authors

Shang Zhu, Federico Bianchi, Wai Tong Chung, Zain Hasan, Rupert Wu, Ce Zhang, James Zou, Ben Athiwaratkun

Table of contents

40+ Models Chosen for Production...40+ Models Chosen for Production...40+ Models Chosen for Production...

Links in this article

Open data scientist Speculative decoding in action

TLDR: Building AI agents to handle complex and long-running engineering tasks requires a different approach than typical AI agent applications. We illustrate key patterns for effective agent development through a real-world case study: using agents to accelerate LLM inference via speculative decoding.

Introduction From Cursor and Claude Code to our recently released open data scientist , we’ve seen the power of coding agents in automating various applications (code understanding, review and debugging, etc.). Much less explored is the end-to-end automation of production workflows using these agents. Most companies have already built different workflows and use-cases for their software, which are often managed by engineering and customer teams who spend a significant amount of time on repetitive infrastructure tasks: configuring environments, launching  jobs, monitoring (potentially) long-running processes, collecting results, and orchestrating them all. These workflows are not always easy to automate, owing to complexity or variability in systems design and scale. Furthermore, these workflows could take days (or even weeks) to complete and require constant human oversight to handle frequent failures and edge cases. At Together AI, we faced this exact challenge while developing our inference optimization workflows. We realized that successfully automating these complex tasks required rethinking how we manage LLM agents. This blog post distills the key principles we learned from building agents to handle complex engineering tasks, illustrated through our internal pipeline of developing efficient LLM inference algorithms. The agentic system presented here has significantly reduced manual intervention while maintaining consistency and reliability. Our engineers can now oversee the training and control it while the agents take care of the “boring” and repetitive aspects of the work, thus reducing the turnaround time. AI Agents for Complex Workflow Automation We found that many coding agents today, such as Claude Code or OpenHands, can effectively follow instructions, edit and execute codebases, and even operate complex workflows. The key design space then becomes the overall architecture in which the agent is embedded. The high level overview of the workflow automation agent is the following:

The key components for agent customization are the context and tools that we equip the agent with, including any internal tools the agent can call and the orchestration of these various tools in completing a multi-step engineering workflow. Example applications of these architectures might be training or evaluating a model, and optimizing hyperparameters of engineering systems. In the last section of this blog, we present a more detailed case study focusing on training an efficient speculator model to speed up LLM inference. What Tasks Should We Automate? To maximize efficiency, tasks selected for automation should meet specific criteria: they need to be: Verifiable - with clear success/failure conditions Well-defined - having unambiguous steps and boundaries Supported by existing tools or tools that can be feasibly integrated

Additionally, they should be generally repetitive for humans, requiring relatively minor adaptations across instances. For example, infrastructure configuration, job monitoring, and hyperparameter tuning in machine learning pipelines often fits this description: they are repetitive yet prone to human error, making them ideal candidates for agentic automation. By focusing on and automating such tasks, teams can offload this routine work to LLM agents while reserving human oversight for high-level decision-making and edge cases. Now let’s dive into what it takes to practically automate these engineering tasks using LLM powered agents! Six Patterns for Building Automation Agents Through extensive experimentation, we identified two sets of core patterns that allowed us to build effective autonomous agents for workflow management: Infrastructure Patterns and Behavioral Patterns. Infrastructure Patterns Infrastructure patterns center on how to build your agentic system in practice ; these are useful to define the architecture and environment where the agent is embedded. Good Tools The agents we have today are already pretty good for automation. They can understand documentation, execute commands, and react to outputs reasonably well. What these agents rely most on are tools, which can be viewed as a way for the agent to interact with and modify its environment; for instance, cat is a tool that allows agents to read files. If the tools we build are well-abstracted and allow the agent to interact with a stable and informative environment (e.g. good error surfacing, nice logging, clean outputs), the agent will be able to complete the task. A stable environment doesn’t mean errors will never occur, but rather that the agent’s tools are well-defined and equip it to deal with and recover from these errors. For example, one of our speculator training scripts was using a custom directory that is only available on our training nodes but was not mounted on the dockerized environment. We initially neglected to consider this and gave only standard instructions to our agent. The agent hit a roadblock with this but was able to find a workaround by specifying new directories for file access. This was because the agent could interact directly with the tools…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

Substantive blog post, no strong traction indicators