togethercomputer/together-python
Python
Captured source
source ↗togethercomputer/together-python
Description: The Official Python Client for Together's API
Language: Python
License: Apache-2.0
Stars: 81
Forks: 26
Open issues: 28
Created: 2023-04-05T19:19:13Z
Pushed: 2026-05-19T17:46:11Z
Default branch: main
Fork: no
Archived: no
README:
> [!NOTE] > ## 🚀 Together Python SDK 2.0 is now available! > > V1 is now considered deprecated and will be maintained in maintanence mode. All new features and development will occur in the 2.0 SDK. > > Check out the new SDK: [together-py](https://github.com/togethercomputer/together-py) > > 📖 Migration Guide: https://docs.together.ai/docs/pythonv2-migration-guide > > ### Upgrade > > Using uv (Recommended): > ``bash > uv sync --upgrade-package together > > > **Using pip:** > bash > pip install --upgrade together > `` >
Together V1

> Note: You are looking at the codebase for Together Python V1. The latest Together Python SDK can be found [here.](https://github.com/togethercomputer/together-py)
The Together Python API Library is the official Python client for Together's API platform, providing a convenient way for interacting with the REST APIs and enables easy integrations with Python 3.10+ applications with easy to use synchronous and asynchronous clients.
Installation
To install Together Python Library from PyPI, simply run:
pip install together
Setting up API Key
> 🚧 You will need to create an account with Together.ai to obtain a Together API Key.
Once logged in to the Together Playground, you can find available API keys in this settings page.
Setting environment variable
export TOGETHER_API_KEY=xxxxx
Using the client
from together import Together client = Together(api_key="xxxxx")
This repo contains both a Python Library and a CLI. We'll demonstrate how to use both below.
Usage – Python Client
Chat Completions
from together import Together
client = Together()
# Simple text message
response = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[{"role": "user", "content": "tell me about new york"}],
)
print(response.choices[0].message.content)
# Multi-modal message with text and image
response = client.chat.completions.create(
model="meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo",
messages=[{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
}
}
]
}]
)
print(response.choices[0].message.content)
# Multi-modal message with multiple images
response = client.chat.completions.create(
model="Qwen/Qwen2.5-VL-72B-Instruct",
messages=[{
"role": "user",
"content": [
{
"type": "text",
"text": "Compare these two images."
},
{
"type": "image_url",
"image_url": {
"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/slack.png"
}
}
]
}]
)
print(response.choices[0].message.content)
# Multi-modal message with text and video
response = client.chat.completions.create(
model="Qwen/Qwen2.5-VL-72B-Instruct",
messages=[{
"role": "user",
"content": [
{
"type": "text",
"text": "What's happening in this video?"
},
{
"type": "video_url",
"video_url": {
"url": "http://commondatastorage.googleapis.com/gtv-videos-bucket/sample/ForBiggerFun.mp4"
}
}
]
}]
)
print(response.choices[0].message.content)The chat completions API supports three types of content:
- Plain text messages using the
contentfield directly - Multi-modal messages with images using
type: "image_url" - Multi-modal messages with videos using
type: "video_url"
When using multi-modal content, the content field becomes an array of content objects, each with its own type and corresponding data.
Streaming
import os
from together import Together
client = Together()
stream = client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[{"role": "user", "content": "tell me about new york"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="", flush=True)Async usage
import asyncio
from together import AsyncTogether
async_client = AsyncTogether()
messages = [
"What are the top things to do in San Francisco?",
"What country is Paris in?",
]
async def async_chat_completion(messages):
async_client = AsyncTogether()
tasks = [
async_client.chat.completions.create(
model="meta-llama/Llama-4-Scout-17B-16E-Instruct",
messages=[{"role": "user", "content": message}],
)
for message in messages
]
responses = await asyncio.gather(*tasks)
for response in responses:
print(response.choices[0].message.content)
asyncio.run(async_chat_completion(messages))Fetching logprobs
Logprobs are logarithms of token-level generation probabilities that indicate the likelihood of the generated token based on the previous tokens in the context. Logprobs allow us to estimate the model's confidence in its outputs, which can be used to decide how to optimally consume the model's output (e.g. rejecting low confidence outputs, retrying or ensembling model outputs etc).
from together import Together
client = Together()
response = client.chat.completions.create(
model="meta-llama/Llama-3.2-3B-Instruct-Turbo",
messages=[{"role": "user", "content": "tell me about new york"}],
logprobs=1
)
response_lobprobs = response.choices[0].logprobs
print(dict(zip(response_lobprobs.tokens, response_lobprobs.token_logprobs)))
# {'New': -2.384e-07, ' York': 0.0, ',': 0.0, ' also': -0.20703125, ' known': -0.20214844, ' as': -8.34465e-07, ... }More details about using logprobs in Together's API can be found here.
Completions
Completions are for code and language models shown here. Below, a code model example is shown.
from together import…
Excerpt shown — open the source for the full document.