WritingReplicateReplicatepublished Aug 8, 2023seen 5d

Fine-tune SDXL with your own images

Open original ↗

Captured source

source ↗
published Aug 8, 2023seen 5dcaptured 3dhttp 200method plain

Fine-tune SDXL with your own images – Replicate blog

Replicate Blog

Fine-tune SDXL with your own images

Posted August 8, 2023 by andreasjansson anotherjesse cloneofsimo daanelson

Stability AI recently open-sourced SDXL , the newest and most powerful version of Stable Diffusion yet. Replicate was ready from day one with a hosted version of SDXL that you can run from the web or using our cloud API .

Today, we’re following up to announce fine-tuning support for SDXL 1.0 . Fine-tuning allows you to train SDXL on a particular object or style, and create a new model that generates images of those objects or styles. For example, we fine-tuned SDXL on images from the Barbie movie and our colleague Zeke . There are multiple ways to fine-tune SDXL, such as Dreambooth , LoRA diffusion ( Originally for LLMs ), and Textual Inversion . We’ve got all of these covered for SDXL 1.0.

In this post, we’ll show you how to fine-tune SDXL on your own images with one line of code and publish the fine-tuned result as your own hosted public or private model. You can train your model with just a few images, and the training process takes about 10-15 minutes. You can also download your fine-tuned LoRA weights to use elsewhere.

🍿 Watch the fine-tuning guide on YouTube Contents

Contents

What is fine-tuning?

Prepare your training images

Add your Replicate API token

Create a model

Start the training

Fine-tuning with faces

Fine-tuning a style

Monitor training progress

Run the model

How fine-tuning works

Understanding learning rates

Advanced: Using your fine-tuned model with Diffusers

What’s next?

What is fine-tuning?

Fine-tuning is a process of taking a pre-trained model and training it with more data to create a new model that is better suited to a particular task. You can fine-tune image generation models like SDXL on your own images to create a new version of the model that is better at generating images of a particular person, object, or style.

Prepare your training images

The training API expects a zip file containing your training images. A handful of images (5-6) is enough to fine-tune SDXL on a single person, but you might need more if your training subject is more complex or the images are very different.

Check out the example datasets in the SDXL repository for inspiration.

Keep the following guidelines in mind when preparing your training images:

Images can be of yourself, your pet, your favorite stuffed animal, or any unique object.

Images should contain only the subject itself, without background noise or other objects.

Do not use images of other people without their consent.

Images can be in JPEG or PNG format.

Dimensions and size don’t matter.

Filenames don’t matter.

Put your images in a folder and zip it up. The directory structure of the zip file doesn’t matter:

Copy

zip -r data.zip data

Add your Replicate API token

Before starting the training job you need to grab your Replicate API token from replicate.com/account . In your shell, store that token in an environment variable called REPLICATE_API_TOKEN .

Copy

export REPLICATE_API_TOKEN=r8_...

Upload your training data

Upload your zip file of training data somewhere on the internet that is publicly accessible, like an S3 bucket or a GitHub Pages site.

Create a model

You also need to create a model on Replicate that will be the destination for the trained SDXL version. Go to replicate.com/create to create the model. In the example below we call it my-name/my-model .

You can make your model public or private. If your model is private, only you will be able to run it. If your model is public, anyone will be able to run it, but only you will be able to update it.

Start the training

Now that you’ve gathered your training data and created a model, it’s time to start the training process using Replicate’s API.

This guide uses Python, but if you want to use another language you can use a client library or call the HTTP API directly

If you don’t already have a Python environment configured, you can kick off the training process using a hosted Jupyter notebook on Google Colab :

Start by installing the Replicate Python package :

Copy

pip install replicate

Then create a training:

Copy

import replicate

training = replicate.trainings.create( version = "stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b" , input = { "input_images" : "https://my-domain/my-input-images.zip" , }, destination = "my-name/my-model" ) print (training)

The input_images input parameter is required, but there are other inputs you can set as well. See the training inputs in the SDXL README for a full list of inputs. Note that by default we will be using LoRA for training, and if you instead want to use Dreambooth you can set is_lora to false . If you wish to perform just the textual inversion, you can set lora_lr to 0 .

Fine-tune using Dreambooth + LoRA with faces dataset

If you’re fine-tuning on faces the default training parameters will work well, but you can also use the use_face_detection_instead setting. This will automatically use face segmentation so that training is focused only on the faces in your images.

Copy

import replicate

training = replicate.trainings.create( version = "stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b" , input = { "input_images" : "https://my-domain/face-images.zip" , "use_face_detection_instead" : True , }, destination = "my-name/my-model" )

Fine-tuning a style

To get the best results for a style you need to:

speed up the lora learning rate, this stops the training from focusing too closely on the details. Experiment with different values like 1e-4, 2e-4. Our Barbie fine-tune used 4e-4.

use a different caption_prefix to refer to a style

Copy

import replicate

training = replicate.trainings.create( version = "stability-ai/sdxl:39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b" , input = { "input_images" : "https://my-domain/style-images.zip" , "lora_lr" : 2e-4 , "caption_prefix" : 'In the style of TOK,' , }, destination = "my-name/my-model" )

To show what’s possible, we put together a couple of fine-tunes based on the Barbie and Tron: Legacy movies.

Monitor training progress

Visit replicate.com/trainings to follow the progress of your training job, or inspect the training programmatically:

Copy

training.reload() print (training.status) print ( " \n " .join(training.logs.split( " \n " )[ - 10 :]))

Run…

Excerpt shown — open the source for the full document.