What does this writing signal mean?

Replicate Writing: Automating image collection

Captured source

source ↗

replicate.com/replicate.com/blog/grab-hundreds-of-images-with-clip-and-laion

Automating image collection

Source ↗

published Aug 5, 2022seen 1wcaptured 5dhttp 200method plain

Automating image collection – Replicate blog

Replicate Blog

Automating image collection

Posted August 5, 2022 by afiaka87

Collecting images enables us to customize powerful machine learning models in new and exciting ways. For example, some of text-to-image models on Replicate can be steered using an existing image. This capability is great for when we want to steer vision models toward a particular scene or aesthetic, but it requires that we have example images of our own.

I’m Clay, a member of LAION and of the team at Replicate. In this post, I’m going to show you how to use a pip package called clip-retrieval to collect hundreds of images (and captions) from the LAION-5B dataset . We’ll look at how to collect images that either match a text description or have a similar style to some existing images.

clip-retrieval was developed by a fellow member of LAION, Romain Beaumont . It works by embedding the billions of images and captions in the LAION dataset with CLIP. Using the magic of k-NN and autofaiss , we can create an in-memory index over these embeddings with fairly fast retrieval times. If you’re interested in how this works on a technical level, I recommend reading Romain’s article “Semantic search with embeddings: index anything” .

Getting started

Let’s get started by installing clip-retrieval :

Copy

pip install clip-retrieval

lets us query pre-built CLIP faiss indexes using the class ClipRetrieval.ClipClient . By default, queries are sent to the free, hosted knn index over LAION-5B built by LAION-AI.

We can set a custom num_images to return. Let’s use 400 for now.

Copy

from clip_retrieval.clip_client import ClipClient, Modality

laion5b_search_client = ClipClient( url = "https://knn5.laion.ai/knn-service" , # url may change, check github.com/rom1504/clip-retrieval indice_name = "laion5B" , num_images = 400 , )

Query LAION-5B with text

After getting set up, we can query the backend:

Copy

results = laion5b_search_client.query( text = "fresh avocado, digital art" )

The response will be a JSON array of results containing a caption, url, and similarity.

Copy

[ { "caption" : "Авокадо" , "url" : "https://t1.ftcdn.net/jpg/00/79/43/44/240_F_79434473_qNSi5WUEi8y3oFrwPjupQvxbUIzXY7mE.jpg" , "id" : 4540616960 , "similarity" : 0.5977489948272705 } // ...more results ]

the second result for “fresh avocado, digital art”

Because the API de-duplicates results, we won’t get exactly 400 back.

Copy

print ( len (cat_results)) $ 321

But, hey - 321 isn’t half bad!

Get variations of your image with a text2image model

I love to use this as a way of finding good init images for various text-to-image models. Init images guide a text-to-image model to produce different variations of your image, with some influence from the specified prompt. In some cases, using an init image can even make the model run faster (I also mentioned init images in my previous blog post ).

We can use Replicate to explore the effect of init images easily. First, we get set up on Replicate:

Copy

pip install replicate

Grab your API token from here , then set your API token as an environment variable.

Copy

export REPLICATE_API_TOKEN = ...

Now, we can run text-to-image models remotely! I use “afiaka87/glid-3-xl” , a photorealistic image model that takes a prompt and an init_image argument. The init_image argument conveniently accepts URL’s, so there’s no need to download clip-retrieval results in advance. Let’s use the first result from our search as an init image:

Copy

model = replicate.models.get( "afiaka87/glid-3-xl" ) version = model.versions.get( "d74db2a276065cf0d42fe9e2917219112ddf8c698f5d9acbe1cc353b58097dab" ) text2image_generations = list ( version.predict( prompt = "fresh avocado, digital art" , guidance_scale = 10.0 , batch_size = 3 , init_image = "https://t1.ftcdn.net/jpg/00/79/43/44/240_F_79434473_qNSi5WUEi8y3oFrwPjupQvxbUIzXY7mE.jpg" , steps = 100 , init_skip_fraction = 0.5 , seed = 0 , ) )[

] # grab the final generation - don't need intermediate outputs. print (text2image_generations)

Query Laion5B with images

Another cool thing we can do with clip-retrieval is take an existing image and try to find images similar to it.

For this, we will need CLIP. Let’s load CLIP with some helper methods for converting torch tensors to the numpy arrays that clip-retrieval expects.

You can find similar examples and usage in the official clip_retrieval.clip_client notebook .

Load CLIP

Copy

import clip import torch

model, preprocess = clip.load( "ViT-L/14" , device = "cpu" , jit = True )

import urllib import io import numpy as np from PIL import Image

def download_image (url): urllib_request = urllib.request.Request( url, data = None , ) with urllib.request.urlopen(urllib_request, timeout = 10 ) as r: img_stream = io.BytesIO(r.read()) return img_stream

def get_image_emb (image_url): with torch.no_grad(): image = Image.open(download_image(image_url)) image_emb = model.encode_image(preprocess(image).unsqueeze( 0 ).to( "cpu" )) image_emb /= image_emb.norm( dim =- 1 , keepdim = True ) image_emb = image_emb.cpu().detach().numpy().astype( "float32" )[ 0 ] return image_emb

Convert your image to a CLIP embedding and pass the embedding to clip-retrieval

Instead of using text as an input and converting it into a text embedding, we now use images as an input and convert it to an image embedding.

Let’s take this image of a model wearing a blue dress and find some similar images.

The input image is an image of a woman wearing a blue dress.

Copy

blue_dress_image_emb = get_image_emb( "https://rukminim1.flixcart.com/image/612/612/kv8fbm80/dress/b/5/n/xs-b165-royal-blue-babiva-fashion-original-imag86psku5pbx2g.jpeg?q=70" ) blue_dress_results = laion5b_search_client.query( embedding_input = blue_dress_image_emb.tolist()) blue_dress_results

Again, the response will be a JSON array of results containing a caption, url, and similarity.

Copy

[ { "caption" : "8c7889e0b92b Cinderella Divine 1295 Long Chiffon Grecian Royal Blue Dress Mid Length Sleeves V Neck ..." , "id" : 2463946620 , "similarity" : 0.9428964853286743 , "url" : "https://cdn.shopify.com/s/files/1/1417/0920/products/1295cd-royal-blue_cfcbd4bc-ed74-47c0-8659-c1b8691990df.jpg?v=1527650905" }, { "caption" : "Classy V-Neck A-Line Floor Length Zipper-Up Mother Of the Bride Dress" , "id" : 717054383 , "similarity" : 0.9329575896263123 , "url" :...

Excerpt shown — open the source for the full document.