RepoTogether AITogether AIpublished May 10, 2024seen 5d

togethercomputer/together-py

Python

Open original ↗

Captured source

source ↗
published May 10, 2024seen 5dcaptured 8hhttp 200method plain

togethercomputer/together-py

Language: Python

License: Apache-2.0

Stars: 10

Forks: 1

Open issues: 3

Created: 2024-05-10T05:10:26Z

Pushed: 2026-06-10T18:36:25Z

Default branch: main

Fork: no

Archived: no

README:

Together Python API library

The Together Python library provides convenient access to the Together REST API from any Python 3.9+ application. The library includes type definitions for all request params and response fields, and offers both synchronous and asynchronous clients powered by httpx.

It is generated with Stainless.

Documentation

The REST API documentation can be found on docs.together.ai. The full API of this library can be found in [api.md](api.md).

Installation

pip install together
uv add together

Usage

The full API of this library can be found in [api.md](api.md).

import os
from together import Together

client = Together(
api_key=os.environ.get("TOGETHER_API_KEY"), # This is the default and can be omitted
)

chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Say this is a test!",
}
],
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
)
print(chat_completion.choices)

While you can provide an api_key keyword argument, we recommend using python-dotenv to add TOGETHER_API_KEY="My API Key" to your .env file so that your API Key is not stored in source control.

Async usage

Simply import AsyncTogether instead of Together and use await with each API call:

import os
import asyncio
from together import AsyncTogether

client = AsyncTogether(
api_key=os.environ.get("TOGETHER_API_KEY"), # This is the default and can be omitted
)

async def main() -> None:
chat_completion = await client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Say this is a test!",
}
],
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
)
print(chat_completion.choices)

asyncio.run(main())

Functionality between the synchronous and asynchronous clients is otherwise identical.

With aiohttp

By default, the async client uses httpx for HTTP requests. However, for improved concurrency performance you may also use aiohttp as the HTTP backend.

You can enable this by installing aiohttp:

# install from PyPI
pip install '--pre together[aiohttp]'

Then you can enable it by instantiating the client with http_client=DefaultAioHttpClient():

import os
import asyncio
from together import DefaultAioHttpClient
from together import AsyncTogether

async def main() -> None:
async with AsyncTogether(
api_key=os.environ.get("TOGETHER_API_KEY"), # This is the default and can be omitted
http_client=DefaultAioHttpClient(),
) as client:
chat_completion = await client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Say this is a test!",
}
],
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
)
print(chat_completion.choices)

asyncio.run(main())

Streaming responses

We provide support for streaming responses using Server Side Events (SSE).

from together import Together

client = Together()

stream = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Say this is a test!",
}
],
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
stream=True,
)
for chat_completion in stream:
print(chat_completion.choices)

The async client uses the exact same interface.

from together import AsyncTogether

client = AsyncTogether()

stream = await client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Say this is a test!",
}
],
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
stream=True,
)
async for chat_completion in stream:
print(chat_completion.choices)

Using types

Nested request parameters are TypedDicts. Responses are Pydantic models which also provide helper methods for things like:

  • Serializing back into JSON, model.to_json()
  • Converting to a dictionary, model.to_dict()

Typed requests and responses provide autocomplete and documentation within your editor. If you would like to see type errors in VS Code to help catch bugs earlier, set python.analysis.typeCheckingMode to basic.

Nested params

Nested parameters are dictionaries, typed using TypedDict, for example:

from together import Together

client = Together()

chat_completion = client.chat.completions.create(
messages=[
{
"content": "content",
"role": "system",
}
],
model="model",
reasoning={},
)
print(chat_completion.reasoning)

The async client uses the exact same interface. If you pass a `PathLike` instance, the file contents will be read asynchronously automatically.

Handling errors

When the library is unable to connect to the API (for example, due to network connection problems or a timeout), a subclass of together.APIConnectionError is raised.

When the API returns a non-success status code (that is, 4xx or 5xx response), a subclass of together.APIStatusError is raised, containing status_code and response properties.

All errors inherit from together.APIError.

import together
from together import Together

client = Together()

try:
client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Say this is a test",
}
],
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
)
except together.APIConnectionError as e:
print("The server could not be reached")
print(e.__cause__) # an underlying Exception, likely raised within httpx.
except together.RateLimitError as e:
print("A 429 status code was received; we should back off a bit.")
except together.APIStatusError as e:
print("Another non-200-range status code was received")
print(e.status_code)
print(e.response)

Error codes are as follows:

| Status Code | Error Type | | ----------- | -------------------------- | | 400 | BadRequestError | | 401 | AuthenticationError | | 403 | PermissionDeniedError | | 404 | NotFoundError | | 422 | UnprocessableEntityError | | 429 | RateLimitError | | >=500 | InternalServerError | | N/A | APIConnectionError |

Retries

Certain errors are automatically retried 2 times by default, with a short exponential backoff. Connection…

Excerpt shown — open the source for the full document.