Make smooth AI generated videos with AnimateDiff and an interpolator
Captured source
source ↗Make smooth AI generated videos with AnimateDiff and an interpolator – Replicate blog
Replicate Blog
Make smooth AI generated videos with AnimateDiff and an interpolator
Posted October 4, 2023 by fofr zsxkib
In this blog post we’ll show you how to combine AnimateDiff and the ST-MFNet frame interpolator to create smooth and realistic videos from a text prompt. You can also specify camera movements using new controls.
You’ll go from a text prompt to a video, to a high-framerate video.
Your browser does not support the video tag.
Create animations with AnimateDiff
AnimateDiff is a model that enhances existing text-to-image models by adding a motion modeling module. The motion module is trained on video clips to capture realistic motion dynamics. It allows Stable Diffusion text-to-image models to create animated outputs, ranging from anime to realistic photographs.
You can try AnimateDiff on Replicate .
Control camera movement
LoRAs provide an efficient way to speed up the fine-tuning process of big models without using much memory. They are most well known for Stable Diffusion models, they are lightweight extensions to a model for a style or subject. The same concept can be applied to an AnimateDiff motion module.
The original AnimateDiff authors have trained 8 new LoRAs for specific camera movements:
Pan up
Pan down
Pan left
Pan right
Zoom in
Zoom out
Rotate clockwise
Rotate anti-clockwise
Using the Replicate hosted model you can use all of these, and choose how strong their affect will be (between 0 and 1). You can also combine multiple camera movements and strengths to create specific effects.
In this example we used the ‘toonyou_beta3’ model with a zoom-in strength of 1 ( view and tweak these settings ):
Your browser does not support the video tag.
Interpolate videos with ST-MFNet
Interpolation adds extra frames to a video. This increases the frame rate and makes the video smoother.
ST-MFNet is a ‘spatio-temporal multi-flow network for frame interpolation’, which is a fancy way of saying it’s a machine learning model that generates extra frames for a video. It does this by studying the changes in space (position of objects) and time (from one frame to another). The “multi-flow” part means it’s considering multiple ways things can move or change from one frame to the next. ST-MFNet works very well with AnimateDiff videos.
You can take a 2 second, 16 frames-per-second (fps) AnimateDiff video and increase it to 32 or 64 fps using ST-MFNet:
Your browser does not support the video tag.
You can also turn it into a slow-motion 4 second video:
Your browser does not support the video tag.
In this video we used the ‘realisticVisionV20_v20’ model with a landscape prompt. We kept the prompt and seed the same but changed the camera movement each time, then interpolated the videos:
Your browser does not support the video tag.
Use the API to create a workflow
You can use the Replicate API to combine multiple models into a workflow, taking the output of one model and using it as input to another model.
Python
Copy
import replicate
Initialize the Replicate API with the token
replicate.init( api_token = 'YOUR_REPLICATE_API_TOKEN' )
print ( "Using AnimateDiff to generate a video" ) output = replicate.run( "zsxkib/animate-diff:269a616c8b0c2bbc12fc15fd51bb202b11e94ff0f7786c026aa905305c4ed9fb" , input = { "prompt" : "a medium shot of a vibrant coral reef with a variety of marine life" } ) video = output[ 0 ] print (video)
https://pbxt.replicate.delivery/HnKtEcfWIoTIby5mGUufWwrXfHZ5VLpAnIHERSrNuiVAzfqGB/0-amediumshotofa.mp4
print ( "Using ST-MFNet to interpolate the video" ) videos = replicate.run( "zsxkib/st-mfnet:2ccdad61a6039a3733d1644d1b71ebf7d03531906007590b8cdd4b051e3fbcd7" , input = { "mp4" : video, "keep_original_duration" : True , "framerate_multiplier" : 4 }, ) video = list (videos_list)[ - 1 ] print (video)
https://pbxt.replicate.delivery/VgwJdbh4NTZKEZpAaDhbzni1DGxzXOrHrCz5clFXIIGXOyaE/tmpaz7xlcls0-amediumshotofa_2.mp4
JavaScript
Copy
import Replicate from "replicate" ;
const replicate = new Replicate ({ auth: process.env. REPLICATE_API_TOKEN , });
console. log ( "Using AnimateDiff to generate a video" ); const output = await replicate. run ( "zsxkib/animate-diff:269a616c8b0c2bbc12fc15fd51bb202b11e94ff0f7786c026aa905305c4ed9fb" , { input: { prompt: "a medium shot of a vibrant coral reef with a variety of marine life" } } );
const video = output[ 0 ]; console. log (video); // https://pbxt.replicate.delivery/HnKtEcfWIoTIby5mGUufWwrXfHZ5VLpAnIHERSrNuiVAzfqGB/0-amediumshotofa.mp4
console. log ( "Using ST-MFNet to interpolate the video" ); const videos = await replicate. run ( "zsxkib/st-mfnet:2ccdad61a6039a3733d1644d1b71ebf7d03531906007590b8cdd4b051e3fbcd7" , { input: { mp4: video, keep_original_duration: true , framerate_multiplier: 4 } } ); console. log (videos[ 1 ]); // https://pbxt.replicate.delivery/VgwJdbh4NTZKEZpAaDhbzni1DGxzXOrHrCz5clFXIIGXOyaE/tmpaz7xlcls0-amediumshotofa_2.mp4
CLI
You can also use the CLI for Replicate to create a workflow:
Copy
export REPLICATE_API_TOKEN = "..."
replicate run zsxkib/st-mfnet --web \ keep_original_duration= true \ framerate_multiplier= 4 \ mp4="$( replicate run zsxkib/animate-diff \ prompt="a medium shot of a vibrant coral reef with a variety of marine life" | \ jq -r '.output | join("")')"
Opens https://replicate.com/p/p2j74vlbv464cojdne6sol6gq4
Wrapping up
Have you used AnimateDiff and ST-MFNet to make a video? Great! We’d love to see it.
Share your videos with us on Discord or tweet them @replicate . Let’s see what you’ve got!
Next: Fine-tuned models now boot in less than one second