RepoReplicateReplicatepublished May 11, 2022seen 5d

replicate/replicate-python

Python

Open original ↗

Captured source

source ↗
published May 11, 2022seen 5dcaptured 12hhttp 200method plain

replicate/replicate-python

Description: Python client for Replicate

Language: Python

License: Apache-2.0

Stars: 906

Forks: 269

Open issues: 66

Created: 2022-05-11T01:26:58Z

Pushed: 2025-08-26T17:18:17Z

Default branch: main

Fork: no

Archived: no

README:

Replicate Python client

This is a Python client for Replicate. It lets you run models from your Python code or Jupyter notebook, and do various other things on Replicate.

Breaking Changes in 1.0.0

The 1.0.0 release contains breaking changes:

  • The replicate.run() method now returns FileOutputs instead of URL strings by default for models that output files. FileOutput implements an iterable interface similar to httpx.Response, making it easier to work with files efficiently.

To revert to the previous behavior, you can opt out of FileOutput by passing use_file_output=False to replicate.run():

output = replicate.run("acmecorp/acme-model", use_file_output=False)

In most cases, updating existing applications to call output.url should resolve any issues. But we recommend using the FileOutput objects directly as we have further improvements planned to this API and this approach is guaranteed to give the fastest results.

> [!TIP] > 👋 Check out an interactive version of this tutorial on Google Colab. > > ![Open In Colab](https://colab.research.google.com/drive/1K91q4p-OhL96FHBAVLsv9FlwFdu6Pn3c)

Requirements

  • Python 3.8+

Install

pip install replicate

Authenticate

Before running any Python scripts that use the API, you need to set your Replicate API token in your environment.

Grab your token from replicate.com/account and set it as an environment variable:

export REPLICATE_API_TOKEN=

We recommend not adding the token directly to your source code, because you don't want to put your credentials in source control. If anyone used your API key, their usage would be charged to your account.

Alternative authentication

As of replicate 1.0.7 and cog 0.14.11 it is possible to pass a REPLICATE_API_TOKEN via the context as part of a prediction request.

The Replicate() constructor will now use this context when available. This grants cog models the ability to use the Replicate client libraries, scoped to a user on a per request basis.

Run a model

Create a new Python file and add the following code, replacing the model identifier and input with your own:

>>> import replicate
>>> outputs = replicate.run(
"black-forest-labs/flux-schnell",
input={"prompt": "astronaut riding a rocket like a horse"}
)
[]
>>> for index, output in enumerate(outputs):
with open(f"output_{index}.webp", "wb") as file:
file.write(output.read())

replicate.run raises ModelError if the prediction fails. You can access the exception's prediction property to get more information about the failure.

import replicate
from replicate.exceptions import ModelError

try:
output = replicate.run("stability-ai/stable-diffusion-3", { "prompt": "An astronaut riding a rainbow unicorn" })
except ModelError as e
if "(some known issue)" in e.prediction.logs:
pass

print("Failed prediction: " + e.prediction.id)

> [!NOTE] > By default the Replicate client will hold the connection open for up to 60 seconds while waiting > for the prediction to complete. This is designed to optimize getting the model output back to the > client as quickly as possible. > > The timeout can be configured by passing wait=x to replicate.run() where x is a timeout > in seconds between 1 and 60. To disable the sync mode you can pass wait=False.

AsyncIO support

You can also use the Replicate client asynchronously by prepending async_ to the method name.

Here's an example of how to run several predictions concurrently and wait for them all to complete:

import asyncio
import replicate

# https://replicate.com/stability-ai/sdxl
model_version = "stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b"
prompts = [
f"A chariot pulled by a team of {count} rainbow unicorns"
for count in ["two", "four", "six", "eight"]
]

async with asyncio.TaskGroup() as tg:
tasks = [
tg.create_task(replicate.async_run(model_version, input={"prompt": prompt}))
for prompt in prompts
]

results = await asyncio.gather(*tasks)
print(results)

To run a model that takes a file input you can pass either a URL to a publicly accessible file on the Internet or a handle to a file on your local device.

>>> output = replicate.run(
"andreasjansson/blip-2:f677695e5e89f8b236e52ecd1d3f01beb44c34606419bcc19345e046d8f786f9",
input={ "image": open("path/to/mystery.jpg") }
)

"an astronaut riding a horse"

Run a model and stream its output

Replicate’s API supports server-sent event streams (SSEs) for language models. Use the stream method to consume tokens as they're produced by the model.

import replicate

for event in replicate.stream(
"meta/meta-llama-3-70b-instruct",
input={
"prompt": "Please write a haiku about llamas.",
},
):
print(str(event), end="")

> [!TIP] > Some models, like meta/meta-llama-3-70b-instruct, > don't require a version string. > You can always refer to the API documentation on the model page for specifics.

You can also stream the output of a prediction you create. This is helpful when you want the ID of the prediction separate from its output.

prediction = replicate.predictions.create(
model="meta/meta-llama-3-70b-instruct",
input={"prompt": "Please write a haiku about llamas."},
stream=True,
)

for event in prediction.stream():
print(str(event), end="")

For more information, see "Streaming output" in Replicate's docs.

Run a model in the background

You can start a model and run it in the background using async mode:

>>> model = replicate.models.get("kvfrans/clipdraw")
>>> version = model.versions.get("5797a99edc939ea0e9242d5e8c9cb3bc7d125b1eac21bda852e5cb79ede2cd9b")
>>> prediction = replicate.predictions.create(
version=version,
input={"prompt":"Watercolor painting…

Excerpt shown — open the source for the full document.