stepfun-ai/Step-Video-TI2V
Python
Captured source
source ↗stepfun-ai/Step-Video-TI2V
Language: Python
License: MIT
Stars: 375
Forks: 35
Open issues: 6
Created: 2025-03-06T04:42:11Z
Pushed: 2025-03-20T09:10:36Z
Default branch: main
Fork: no
Archived: no
README:
🔥🔥🔥 News!!
- Mar 17, 2025: 👋 We release the inference code and model weights of Step-Video-TI2V. Download
- Mar 17, 2025: 👋 We release a new TI2V benchmark Step-Video-TI2V-Eval
- Mar 17, 2025: 👋 Step-Video-TI2V has been integrated into ComfyUI-Stepvideo-ti2v. Enjoy!
- Mar 17, 2025: 🎉 We have made our technical report available as open source. Read
Motion Control
战马跳跃 战马蹲下 战马向前奔跑,然后转身
Motion Dynamics Control
两名男子在互相拳击,镜头环绕两人拍摄。(motion_score: 2) 两名男子在互相拳击,镜头环绕两人拍摄。(motion_score: 5) 两名男子在互相拳击,镜头环绕两人拍摄。(motion_score: 20)
🎯 Tips: The default motion_score = 5 is suitable for general use. If you need more stability, set motion_score = 2, though it may lack dynamism in certain movements. For greater movement flexibility, you can use motion_score = 10 or motion_score = 20 to enable more intense actions. Feel free to customize the motion_score based on your creative needs to fit different use cases.
Camera Control
镜头环绕女孩,女孩在跳舞 镜头缓慢推进,女孩在跳舞 镜头拉远,女孩在跳舞
Supported Camera Movements | 支持的运镜方式
| Camera Movement | 运镜方式 | |--------------------------------|--------------------| | Fixed Camera | 固定镜头 | | Pan Up/Down/Left/Right | 镜头上/下/左/右移 | | Tilt Up/Down/Left/Right | 镜头上/下/左/右摇 | | Zoom In/Out | 镜头放大/缩小 | | Dolly In/Out | 镜头推进/拉远 | | Camera Rotation | 镜头旋转 | | Tracking Shot | 镜头跟随 | | Orbit Shot | 镜头环绕 | | Rack Focus | 焦点转移 |
🔧 Motion Score Considerations: motion_score = 5 or 10 offers smoother and more accurate motion than motion_score = 2, with motion_score = 10 providing the best responsiveness and camera tracking. Choosing the suitable setting enhances motion precision and fluidity.
Anime-Style Generation
女生向前行走,背景是虚化模糊的效果 女人眨眼,然后对着镜头做飞吻的动作。 狸猫战士双手缓缓上扬,雷电从手中向四周扩散, 身后灵兽影像的双眼闪烁强光,张开巨口发出低吼
Step-Video-TI2V excels in anime-style generation, enabling you to explore various anime-style images and create customized videos to match your preferences.
Table of Contents
1. [Introduction](#1-introduction) 2. [Model Summary](#2-model-summary) 3. [Model Download](#3-model-download) 4. [Model Usage](#4-model-usage) 5. [Comparisons](#5-Comparisons) 6. [Online Engine](#6-online-engine) 7. [Citation](#7-citation)
1. Introduction
We present Step-Video-TI2V, a state-of-the-art text-driven image-to-video generation model with 30B parameters, capable of generating videos up to 102 frames based on both text and image inputs. We build Step-Video-TI2V-Eval as a new benchmark for the text-driven image-to-video task and compare Step-Video-TI2V with open-source and commercial TI2V engines using this dataset. Experimental results demonstrate the state-of-the-art performance of Step-Video-TI2V in the image-to-video generation task.
2. Model Summary
Step-Video-TI2V is trained based on Step-Video-T2V. To incorporate the image condition as the first frame of the generated video, we encode it into latent representations using Step-Video-T2V’s Video-VAE and concatenate them along the channel dimension of the video latent. Additionally, we introduce a motion score condition, enabling users to control the dynamic level of the video generated from the image condition.
3. Model Download
| Models | 🤗 Huggingface | 🤖 Modelscope | 🎛️ ComfyUI | |:------------------:|:--------------:|:-------------:|:-----------------:| | Step-Video-TI2V | Download | Download | Link |
4. Model Usage
📜 4.1 Dependencies and Installation
git clone https://github.com/stepfun-ai/Step-Video-TI2V.git conda create -n stepvideo python=3.10 conda activate stepvideo cd Step-Video-TI2V pip install -e .
🚀 4.2. Inference Scripts
python api/call_remote_server.py --model_dir where_you_download_dir & ## We assume you have more than 4 GPUs available. This command will return the URL for both the caption API and the VAE API. Please use the returned URL in the following command. parallel=1 or 4 # or parallel=8 Single GPU can also predict the results, although it will take longer url='127.0.0.1' model_dir=where_you_download_dir torchrun --nproc_per_node $parallel run_parallel.py --model_dir $model_dir --vae_url $url --caption_url $url --ulysses_degree $parallel --prompt "笑起来" --first_image_path ./assets/demo.png --infer_steps 50 --cfg_scale 9.0 --time_shift 13.0 --motion_score 5.0
We list some more useful configurations for easy usage:
| Argument | Default | Description | |:----------------------:|:---------:|:-----------------------------------------:| | --model_dir | None | The model checkpoint for video generation | | --prompt | “笑起来” | The text prompt for I2V generation | | first_image_path | ./assets/demo.png | The reference image path for I2V task. | | --infer_steps | 50 | The number of steps for sampling | | --cfg_scale | 9.0 | Embedded Classifier free guidance scale | | --time_shift | 7.0 | Shift factor for flow matching schedulers. | | --motion_score | 5.0 | Score to control the motion level of the video. | | --seed | None | The random seed for generating video, if None, we init a random seed | | --use-cpu-offload | False | Use CPU offload for the model load to save more memory, necessary for high-res video generation | | --save-path | ./results | Path to save the generated video |
5. Comparisons
To evaluate the performance of Step-Video-TI2V, We leverage VBench-I2V to systematically compare Step-Video-TI2V with recently released leading open-source models. The detailed results presented in the table below, highlight our model’s superior performance over these models. We presented two results of Step-Video-TI2V, with the motion set to 5 and 10, respectively. As expected, this mechanism effectively balances the motion dynamics and stability (or consistency) of the generated videos. Additionally, we submitted our results to the [VBench-I2V…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10New video generation repo from notable lab, decent stars.