groq/groq-python
Python
Captured source
source ↗groq/groq-python
Description: The official Python Library for the Groq API
Language: Python
License: Apache-2.0
Stars: 606
Forks: 59
Open issues: 5
Created: 2024-02-14T22:48:15Z
Pushed: 2026-06-03T14:31:01Z
Default branch: main
Fork: no
Archived: no
README:
Groq Python API library
The Groq Python library provides convenient access to the Groq REST API from any Python 3.10+ application. The library includes type definitions for all request params and response fields, and offers both synchronous and asynchronous clients powered by httpx.
It is generated with Stainless.
Documentation
The REST API documentation can be found on console.groq.com. The full API of this library can be found in [api.md](api.md).
Installation
# install from PyPI pip install groq
Usage
The full API of this library can be found in [api.md](api.md).
import os
from groq import Groq
client = Groq(
api_key=os.environ.get("GROQ_API_KEY"), # This is the default and can be omitted
)
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Explain the importance of low latency LLMs",
}
],
model="openai/gpt-oss-20b",
)
print(chat_completion.choices[0].message.content)While you can provide an api_key keyword argument, we recommend using python-dotenv to add GROQ_API_KEY="My API Key" to your .env file so that your API Key is not stored in source control.
Async usage
Simply import AsyncGroq instead of Groq and use await with each API call:
import os
import asyncio
from groq import AsyncGroq
client = AsyncGroq(
api_key=os.environ.get("GROQ_API_KEY"), # This is the default and can be omitted
)
async def main() -> None:
chat_completion = await client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Explain the importance of low latency LLMs",
}
],
model="openai/gpt-oss-20b",
)
print(chat_completion.choices[0].message.content)
asyncio.run(main())Functionality between the synchronous and asynchronous clients is otherwise identical.
With aiohttp
By default, the async client uses httpx for HTTP requests. However, for improved concurrency performance you may also use aiohttp as the HTTP backend.
You can enable this by installing aiohttp:
# install from PyPI pip install groq[aiohttp]
Then you can enable it by instantiating the client with http_client=DefaultAioHttpClient():
import os
import asyncio
from groq import DefaultAioHttpClient
from groq import AsyncGroq
async def main() -> None:
async with AsyncGroq(
api_key=os.environ.get("GROQ_API_KEY"), # This is the default and can be omitted
http_client=DefaultAioHttpClient(),
) as client:
chat_completion = await client.chat.completions.create(
messages=[
{
"role": "user",
"content": "Explain the importance of low latency LLMs",
}
],
model="openai/gpt-oss-20b",
)
print(chat_completion.id)
asyncio.run(main())Using types
Nested request parameters are TypedDicts. Responses are Pydantic models which also provide helper methods for things like:
- Serializing back into JSON,
model.to_json() - Converting to a dictionary,
model.to_dict()
Typed requests and responses provide autocomplete and documentation within your editor. If you would like to see type errors in VS Code to help catch bugs earlier, set python.analysis.typeCheckingMode to basic.
Nested params
Nested parameters are dictionaries, typed using TypedDict, for example:
from groq import Groq
client = Groq()
chat_completion = client.chat.completions.create(
messages=[
{
"content": "string",
"role": "system",
}
],
model="meta-llama/llama-4-scout-17b-16e-instruct",
compound_custom={},
)
print(chat_completion.compound_custom)File uploads
Request parameters that correspond to file uploads can be passed as bytes, or a `PathLike` instance or a tuple of (filename, contents, media type).
from pathlib import Path
from groq import Groq
client = Groq()
client.audio.transcriptions.create(
model="whisper-large-v3-turbo",
file=Path("/path/to/file"),
)The async client uses the exact same interface. If you pass a `PathLike` instance, the file contents will be read asynchronously automatically.
Handling errors
When the library is unable to connect to the API (for example, due to network connection problems or a timeout), a subclass of groq.APIConnectionError is raised.
When the API returns a non-success status code (that is, 4xx or 5xx response), a subclass of groq.APIStatusError is raised, containing status_code and response properties.
All errors inherit from groq.APIError.
import groq
from groq import Groq
client = Groq()
try:
client.chat.completions.create(
messages=[
{
"role": "system",
"content": "You are a helpful assistant.",
},
{
"role": "user",
"content": "Explain the importance of low latency LLMs",
},
],
model="openai/gpt-oss-20b",
)
except groq.APIConnectionError as e:
print("The server could not be reached")
print(e.__cause__) # an underlying Exception, likely raised within httpx.
except groq.RateLimitError as e:
print("A 429 status code was received; we should back off a bit.")
except groq.APIStatusError as e:
print("Another non-200-range status code was received")
print(e.status_code)
print(e.response)Error codes are as follows:
| Status Code | Error Type | | ----------- | -------------------------- | | 400 | BadRequestError | | 401 | AuthenticationError | | 403 | PermissionDeniedError | | 404 | NotFoundError | | 422 | UnprocessableEntityError | | 429 | RateLimitError | | >=500 | InternalServerError | | N/A | APIConnectionError |
Retries
Certain errors are automatically retried 2 times by default, with a short exponential backoff. Connection errors (for example, due to a network connectivity problem), 408 Request Timeout, 409 Conflict, 429 Rate Limit, and >=500 Internal errors are all retried by default.
You can use the max_retries option to configure or disable retry settings:
from groq import Groq # Configure the default for all requests: client = Groq(…
Excerpt shown — open the source for the full document.