WritingReplicateReplicatepublished Jul 18, 2022seen 5d

Exploring text to image models

Open original ↗

Captured source

source ↗
published Jul 18, 2022seen 5dcaptured 3dhttp 200method plain

Exploring text to image models – Replicate blog

Replicate Blog

Exploring text to image models

Posted July 18, 2022 by afiaka87 rossjillian

I’m Clay, a member of the team at Replicate. In this post, I’ll show you how Replicate allows you to easily explore many open text to image models .

You can follow along by downloading the accompanying Jupyter notebook here

If you’re running the notebook in Colab, it’s recommended to use Firefox/Chrome.

Install

It’s wise to use a virtual environment to keep your global Python installation clean. venv or conda will work fine:

Copy

python3 -m venv replicate_venv source replicate_venv/bin/activate

or

Copy

conda create --name replicate_venv conda activate replicate_venv

In whatever environment you choose, install Replicate’s Python client .

Copy

( replicate_venv ) % pip install replicate

Login

To use the API, you’ll need an API access token. You can get a token by subscribing to Replicate . Then, you’ll be able to log in to Replicate using your API token each time you need to run Python.

You should never store your API key directly in a Python file or notebook - this would enable others to gain unauthorized access to your account. Instead, it is recommended to set the REPLICATE_API_TOKEN variable in your shell prior to running Python:

Copy

( replicate_venv ) % REPLICATE_API_TOKEN="..." python run_text2image_model.py

If you’re in a Jupyter notebook, you can use getpass to receive user input beneath a cell without displaying it.

Assuming everything worked, you should be able to import the replicate module now. In our accompanying notebook, we also include pathlib.Path , which is needed for some inputs.

Copy

import replicate from pathlib import Path

Generate an image from text

Using a few lines of Python, you can programmatically generate an image via text.

Replicate allows us to look up models by f"{username}/{model_name} with replicate.models.get .

For the example in our notebook, we’ll use “afiaka87/glid-3-xl” , a great model for generating photorealistic images. For fun, let’s generate an image of an avocado lightbulb!

Copy

model = replicate.models.get( "afiaka87/glid-3-xl" ) version = model.versions.get( "d74db2a276065cf0d42fe9e2917219112ddf8c698f5d9acbe1cc353b58097dab" )

Models on Replicate are run using the .predict method. Let’s take a quick look at the named/keyword arguments for .predict .

The named/keyword arguments for each model will vary. glid-3-xl requires one input prompt - a scene description you would like to visualize.

We also set the seed to 0. Setting a manual seed will encourage models to return the exact same output for a set of inputs. Otherwise, a seed will be chosen randomly. Outputs may still differ slightly, but managing a seed is generally a good idea and lots of models on Replicate have support for it.

Copy

prediction_generator = version.predict( prompt = "an image of a fresh avocado in the form of a lightbulb" , seed = 0 )

Calling .predict simply initializes the model, but does not queue it to be run on Replicate. To run the model, simply iterate over the generator.

Copy

generated_image_batches = list (prediction_generator) final_image_batch = generated_image_batches[ - 1 ] # ["https://...",] print (final_image_batch)

Because we are only interested in the final, finished output the model returns, we can just cast the generator to a list and grab the last ( -1 ) element.

The final batch is a list of urls with size is determined by batch_size (1 by default).

Enhance an image

The API opens up lots of possibilities, like passing the output from one model as the input to another. A common example of this is upscaling, where an image generated by one model is piped into a super-resolution model to enlarge it.

We’ll use “raoumer/srrescgan” to upscale our image of an avocado lightbulb, but there are many upscaling models on Replicate that you can explore.

Copy

generation_to_enhance = Path(final_image_batch[ 0 ]) # There's only one URL in the list by default. upscaling_model_api = replicate.models.get( "raoumer/srrescgan" ) high_res_outputs = upscaling_model_api.predict( image = generation_to_enhance)

Create variations of an image

Some text-to-image models allow you to pass in an existing image called an init image. This produces different variations of your image, with some influence from the specified prompt.

We’ll use “laion-ai/ongo” , a version of glid-3-xl finetuned on WikiArt.

You’ll need an image to create variations of. We’ll use ongo to vary the image of this farmhouse:

Image inputs to Replicate may be a URL or local path.

Copy

init_image = Path( "/assets/blog/exploring-text-to-image-models/farmhouse.jpeg" )

It can be valuable to tweak with various settings to improve model performance, or you can simply remove optional arguments to use the default values set by the model author.

init_image : A pathlib.Path “initial image” to mix with the generation, causing the model to take influence from the provided image in addition to the specified prompt. Can also be a URL (cast as a Path)

guidance_scale : Determines how much the generation should be guided by your text.

batch_size : Integer from 1 - 12. How many variations should be produced. Low batch sizes are much faster than high batch sizes.

steps : Integer from 30-250. Number of discrete timesteps to run the model for. When using an init_image , the actual number of timesteps will be steps * init_skip_fraction (half as many by default). Increasing will improve accuracy at cost of performance.

init_skip_fraction : Decimal from 0.0 to 1.0. 0.5 by default. How much influence your image will have on the generation. 0.0 will use almost none, 1.0 will simply encode your image without influence from the model.

When in doubt, you can simply remove an argument and its default will be used instead.

Copy

model = replicate.models.get( "laion-ai/ongo" ) version = model.versions.get( "1b3cd15121ec450baa71bbbdacddef9217519f12ca12ccfef36eeaa20ad89b9d" ) ongo_variation_generator = version.predict( prompt = "professional painting of a red lakehouse in the style of monet" , guidance_scale = 10.0 , # 1.0 - 100.0 total_steps = 250 , # 30-250 init_skip_fraction = 0.35 , # 0.0 - 1.0 batch_size = 3 , # 1 - 12 init_image = Path( "/assets/blog/exploring-text-to-image-models/farmhouse.jpeg" ), seed = 0 , # )

Recall .predict simply returns a generator. To start the prediction, you must first enumerate through it to get your final…

Excerpt shown — open the source for the full document.