WritingReplicateReplicatepublished Jan 24, 2025seen 5d

You can now fine-tune open-source video models

Open original ↗

Captured source

source ↗
published Jan 24, 2025seen 5dcaptured 3dhttp 200method plain

You can now fine-tune open-source video models – Replicate blog

Replicate Blog

You can now fine-tune open-source video models

Posted January 24, 2025 by zsxkib zeke deepfates bfirsh

AI video generation has gotten really good.

Some of the best video models like tencent/hunyuan-video are open-source, and the community has been hard at work building on top of them. We’ve adapted the Musubi Tuner by @kohya_tech to run on Replicate, so you can fine-tune HunyuanVideo on your own visual content.

Never Gonna Give You Up animal edition, courtesy of @flngr and @fofr .

HunyuanVideo is good at capturing the style of the training data, not only in the visual appearance of the imagery and the color grading, but also in the motion of the camera and the way the characters move.

This in-motion style transfer is unique to this implementation: other video models that are trained only on images cannot capture it.

Here are some examples of videos created using different fine-tunes, all with the same settings, size, prompt and seed:

Twin Peaks Pixar Cowboy Bebop Westworld

You can make your own fine-tuned video model to:

Create videos in a specific visual style

Generate animations of particular characters

Capture specific types of motion or movement

Build custom video effects

In this post, we’ll show you how to gather training data, create a fine-tuned video model, and generate videos with it.

Note

Prefer to learn by watching? Check out Sakib’s 5-minute video demo on YouTube .

Prerequisites

A Replicate account

A video or YouTube URL to use as training data

Step 1: Create your training data

To train a video model, you’ll need a dataset of video clips and text captions describing each video.

This process can be time-consuming, so we’ve created a model to make it easier: zsxkib/create-video-dataset takes a video file or YouTube URL as input, slices it into smaller clips, and generates captions for each clip.

Here’s how to create training data right in your browser with just a few clicks:

Find a YouTube URL (or video file) that you want to use for training.

Go to replicate.com/zsxkib/create-video-dataset

Paste your video URL, or upload a video file from your computer.

Choose a unique trigger word like RCKRLL . Avoid using real words that have existing associations.

Click Run and download the resulting ZIP file.

Optional: Check out the logs from your training run if you want to see the auto-generated captions for each clip.

Step 2: Train your model

Now you’ll create your own fine-tuned video generation model using the training data you just compiled.

Go to replicate.com/zsxkib/hunyuan-video-lora/train

Choose a name for your model.

For the input_videos input, upload the ZIP file you just downloaded.

Enter the same trigger word you used before, e.g. RCKRLL

Adjust training settings (we recommend starting with 2 epochs)

Click Create training

Training typically takes about 5-10 minutes with default settings, but depends on the size and number of clips.

Step 3: Run your model

Once the training is complete, you can generate new videos in several ways:

Run the model in your browser directly from your model’s page.

Run your model in Replicate’s Playground : Go to “Manage models” and type your model name.

Use the API: Go to your model’s page and click the API tab for code snippets.

You can run your model as an API with just a few lines of code.

Here’s an example using the replicate-javascript client:

Copy

import Replicate from "replicate"

const replicate = new Replicate ()

const model = "your-username/your-model:your-model-version" const prompt = "A lion dancing on a subway train the style of RCKRLL" const output = await replicate. run (model, {input: { prompt }}) console. log (output)

Step 4: Experiment for best results

Video fine-tuning is pretty new, so we’re still learning what works best.

Here are some early tips:

Use a unique trigger word that doesn’t have associations with real words.

Experiment with training settings:

More epochs == better quality but longer training time

Adjust the LoRA rank

Increase batch size to speed up training

Use max_steps to control training duration precisely

If training looks like it’s going to take several hours, cancel it and try:

Reducing the number of epochs

Reducing the rank

Increasing batch size

Check the GitHub README for detailed parameter explanations

Extra credit: Train new models programmatically

If you want to automate the process or build applications, you can use our API.

Here’s an example of how to train a new model programmatically using the Replicate Python client:

Copy

import replicate import time

Create a training dataset from a video

dataset = replicate.run( "zsxkib/create-video-dataset:4eb83cc8ba563da7032933374444a9a7a6f630b5b1e4f219cf9088f6a4acc138" , input = { "video_url" : "YOUR_VIDEO_URL" , "trigger_word" : "UNIQUE_TRIGGER" , "start_time" : 10 , "end_time" : 40 , "num_segments" : 8 , "autocaption" : True , "autocaption_prefix" : "a video of UNIQUE_TRIGGER," } )

Create a new model to store the training results

model = replicate.models.create( owner = "your-username" , name = "your-model-name" , visibility = "public" , hardware = "gpu-t4" )

Start training with the processed video

training = replicate.trainings.create( model = "zsxkib/hunyuan-video-lora" , version = "04279caf015c30a635cabc4077b5bd82c5c706262eb61797a48db139444bcca9" , # Current model version ID input = { "input_videos" : dataset.url, "trigger_word" : "UNIQUE_TRIGGER" , "epochs" : 2 , "batch_size" : 8 , }, destination = "your-username/your-model-name" , # Where to push the trained model )

Wait for training to complete

while training.status not in [ "succeeded" , "failed" , "canceled" ]: training.reload() time.sleep( 10 ) # Wait 10 seconds between checks

if training.status != "succeeded" : raise Exception ( f "Training failed: { training.error } " )

Generate new videos with your fine-tuned model

output = replicate.run( training.output[ 'version' ], input = { "prompt" : "A video of UNIQUE_TRIGGER in a cyberpunk city" , "num_frames" : 45 , "frame_rate" : 24 } )

What’s next?

Fine-tuning video models is in its early days, so we don’t really know yet what is possible, and what might be able to be built on top of it.

Give it a try and show us what you’ve made on Discord , or tag @replicate on X.

Next: Generate short videos with the Replicate playground

Notability

notability 5.0/10

New fine-tuning feature for video models announced by Replicate