What does this repo signal mean?

Meta AI (Llama) published meta-llama/llama-api-python (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo meta-llama/llama-api-python · language Python · Minor repo with low stars. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Meta AI (Llama) Repo: meta-llama/llama-api-python

Captured source

source ↗

GitHub/github.com/meta-llama/llama-api-python

meta-llama/llama-api-python repository metadata

Source ↗

published Mar 24, 2025seen Jun 5captured Jun 11http 200method plain

meta-llama/llama-api-python

Description: The official Python library for the Llama API

Language: Python

License: MIT

Stars: 63

Forks: 37

Open issues: 9

Created: 2025-03-24T17:10:39Z

Pushed: 2026-05-12T19:09:27Z

Default branch: main

Fork: no

Archived: no

README:

Llama API Client Python API library

The Llama API Client Python library provides convenient access to the Llama API Client REST API from any Python 3.9+ application. The library includes type definitions for all request params and response fields, and offers both synchronous and asynchronous clients powered by httpx.

It is generated with Stainless.

Documentation

The REST API documentation can be found on llama.developer.meta.com. The full API of this library can be found in [api.md](api.md).

Installation

pip install llama-api-client

Usage

The full API of this library can be found in [api.md](api.md).

import os
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient(
api_key=os.environ.get("LLAMA_API_KEY"), # This is the default and can be omitted
)

create_chat_completion_response = client.chat.completions.create(
messages=[
{
"content": "string",
"role": "user",
}
],
model="model",
)
print(create_chat_completion_response.completion_message)

While you can provide an api_key keyword argument, we recommend using python-dotenv to add LLAMA_API_KEY="My API Key" to your .env file so that your API Key is not stored in source control.

Async usage

Simply import AsyncLlamaAPIClient instead of LlamaAPIClient and use await with each API call:

import os
import asyncio
from llama_api_client import AsyncLlamaAPIClient

client = AsyncLlamaAPIClient(
api_key=os.environ.get("LLAMA_API_KEY"), # This is the default and can be omitted
)

async def main() -> None:
create_chat_completion_response = await client.chat.completions.create(
messages=[
{
"content": "string",
"role": "user",
}
],
model="model",
)
print(create_chat_completion_response.completion_message)

asyncio.run(main())

Functionality between the synchronous and asynchronous clients is otherwise identical.

With aiohttp

By default, the async client uses httpx for HTTP requests. However, for improved concurrency performance you may also use aiohttp as the HTTP backend.

You can enable this by installing aiohttp:

# install from the production repo
pip install 'llama_api_client[aiohttp] @ git+ssh://git@github.com/meta-llama/llama-api-python.git'

Then you can enable it by instantiating the client with http_client=DefaultAioHttpClient():

import os
import asyncio
from llama_api_client import DefaultAioHttpClient
from llama_api_client import AsyncLlamaAPIClient

async def main() -> None:
async with AsyncLlamaAPIClient(
api_key=os.environ.get("LLAMA_API_KEY"), # This is the default and can be omitted
http_client=DefaultAioHttpClient(),
) as client:
create_chat_completion_response = await client.chat.completions.create(
messages=[
{
"content": "string",
"role": "user",
}
],
model="model",
)
print(create_chat_completion_response.completion_message)

asyncio.run(main())

Streaming responses

We provide support for streaming responses using Server Side Events (SSE).

from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

stream = client.chat.completions.create(
messages=[
{
"content": "string",
"role": "user",
}
],
model="model",
stream=True,
)
for chunk in stream:
print(chunk.event.delta.text, end="", flush=True)

The async client uses the exact same interface.

from llama_api_client import AsyncLlamaAPIClient

client = AsyncLlamaAPIClient()

stream = await client.chat.completions.create(
messages=[
{
"content": "string",
"role": "user",
}
],
model="model",
stream=True,
)
async for chunk in stream:
print(chunk.event.delta.text, end="", flush=True)

Using types

Nested request parameters are TypedDicts. Responses are Pydantic models which also provide helper methods for things like:

Serializing back into JSON, model.to_json()
Converting to a dictionary, model.to_dict()

Typed requests and responses provide autocomplete and documentation within your editor. If you would like to see type errors in VS Code to help catch bugs earlier, set python.analysis.typeCheckingMode to basic.

File uploads

Request parameters that correspond to file uploads can be passed as bytes, or a `PathLike` instance or a tuple of (filename, contents, media type).

from pathlib import Path
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

num_bytes = Path("/path/to/file").stat().st_size

# 1. initiate upload session
r = client.uploads.create(
bytes=num_bytes,
filename="simpleqa.jsonl",
mime_type="application/jsonl",
purpose="messages_finetune",
)

# 2. upload part
client.uploads.part(
upload_id=r.id,
data=Path("/path/to/file"),
)

The async client uses the exact same interface. If you pass a `PathLike` instance, the file contents will be read asynchronously automatically.

Handling errors

When the library is unable to connect to the API (for example, due to network connection problems or a timeout), a subclass of llama_api_client.APIConnectionError is raised.

When the API returns a non-success status code (that is, 4xx or 5xx response), a subclass of llama_api_client.APIStatusError is raised, containing status_code and response properties.

All errors inherit from llama_api_client.APIError.

import llama_api_client
from llama_api_client import LlamaAPIClient

client = LlamaAPIClient()

try:
client.chat.completions.create(
messages=[
{
"content": "string",
"role": "user",
}
],
model="model",
)
except llama_api_client.APIConnectionError as e:
print("The server could not be reached")
print(e.__cause__) # an underlying Exception, likely raised within httpx.
except llama_api_client.RateLimitError as e:
print("A 429 status code was received; we should back off a bit.")
except llama_api_client.APIStatusError as e:
print("Another non-200-range status code was received")
print(e.status_code)...

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Minor repo with low stars