zai-org/CogVideo
Python
Captured source
source ↗zai-org/CogVideo
Description: text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Language: Python
License: Apache-2.0
Stars: 12774
Forks: 1305
Open issues: 113
Created: 2022-05-29T06:46:18Z
Pushed: 2025-11-04T11:19:04Z
Default branch: main
Fork: no
Archived: no
README:
CogVideo & CogVideoX
[中文阅读](./README_zh.md)
[日本語で読む](./README_ja.md)
Experience the CogVideoX-5B model online at 🤗 Huggingface Space or 🤖 ModelScope Space
📚 View the paper and user guide
👋 Join our WeChat and Discord
📍 Visit QingYing and API Platform to experience larger-scale commercial video generation models.
Project Updates
- 🔥🔥 News: ``
2025/03/24``: We have launched CogKit, a fine-tuning and inference framework for the CogView4 and CogVideoX series. This toolkit allows you to fully explore and utilize our multimodal generation models. - 🔥 News: ``
2025/02/28: DDIM Inverse is now supported inCogVideoX-5BandCogVideoX1.5-5B`. Check [here](inference/ddim_inversion.py). - 🔥 News: ``
2025/01/08: We have updated the code forLorafine-tuning based on thediffusers` version model, which uses less GPU memory. For more details, please see [here](finetune/README.md). - 🔥 News: ``
2024/11/15: We released theCogVideoX1.5` model in the diffusers version. Only minor parameter adjustments are needed to continue using previous code. - 🔥 News: ``
2024/11/08``: We have released the CogVideoX1.5 model. CogVideoX1.5 is an upgraded version of the open-source model CogVideoX.
The CogVideoX1.5-5B series supports 10-second videos with higher resolution, and CogVideoX1.5-5B-I2V supports video generation at any resolution. The SAT code has already been updated, while the diffusers version is still under adaptation. Download the SAT version code here.
- 🔥 News: ``
2024/10/13: A more cost-effective fine-tuning framework forCogVideoX-5B` that works with a single
4090 GPU, cogvideox-factory, has been released. It supports fine-tuning with multiple resolutions. Feel free to use it!
- 🔥 News: ``
2024/10/10``: We have updated our technical report. Please
click here to view it. More training details and a demo have been added. To see the demo, click here.- 🔥 News: ``2024/10/09``: We have publicly released the technical documentation for CogVideoX fine-tuning on Feishu, further increasing distribution flexibility. All examples in the public documentation can be fully reproduced.
- 🔥 News: ``
2024/9/19``: We have open-sourced the CogVideoX series image-to-video model CogVideoX-5B-I2V.
This model can take an image as a background input and generate a video combined with prompt words, offering greater controllability. With this, the CogVideoX series models now support three tasks: text-to-video generation, video continuation, and image-to-video generation. Welcome to try it online at Experience.
- 🔥 ``
2024/9/19``: The Caption
model CogVLM2-Caption, used in the training process of CogVideoX to convert video data into text descriptions, has been open-sourced. Welcome to download and use it.
- 🔥 ``
2024/8/27``: We have open-sourced a larger model in the CogVideoX series, CogVideoX-5B. We have
significantly optimized the model's inference performance, greatly lowering the inference threshold. You can run CogVideoX-2B on older GPUs like GTX 1080TI, and CogVideoX-5B on desktop GPUs like RTX 3060. Please strictly follow the [requirements](requirements.txt) to update and install dependencies, and refer to [cli_demo](inference/cli_demo.py) for inference code. Additionally, the open-source license for the CogVideoX-2B model has been changed to the Apache 2.0 License.
- 🔥 ``
2024/8/6``: We have open-sourced 3D Causal VAE, used for CogVideoX-2B, which can reconstruct videos with
almost no loss.
- 🔥 ``
2024/8/6``: We have open-sourced the first model of the CogVideoX series video generation models, **CogVideoX-2B
**.
- 🌱 Source: ``
2022/5/19``: We have open-sourced the CogVideo video generation model (now you can see it in
the CogVideo branch). This is the first open-source large Transformer-based text-to-video generation model. You can access the ICLR'23 paper for technical details.
Table of Contents
Jump to a specific section:
- [Quick Start](#quick-start)
- [Prompt Optimization](#prompt-optimization)
- [SAT](#sat)
- [Diffusers](#diffusers)
- [Gallery](#gallery)
- [CogVideoX-5B](#cogvideox-5b)
- [CogVideoX-2B](#cogvideox-2b)
- [Model Introduction](#model-introduction)
- [Friendly Links](#friendly-links)
- [Project Structure](#project-structure)
- [Quick Start with Colab](#quick-start-with-colab)
- [Inference](#inference)
- [finetune](#finetune)
- [sat](#sat-1)
- [Tools](#tools)
- [CogVideo(ICLR'23)](#cogvideoiclr23)
- [Citation](#citation)
- [Model-License](#model-license)
Quick Start
Prompt Optimization
Before running the model, please refer to [this guide](inference/convert_demo.py) to see how we use large models like GLM-4 (or other comparable products, such as GPT-4) to optimize the model. This is crucial because the model is trained with long prompts, and a good prompt directly impacts the quality of the video generation.
SAT
Please make sure your Python version is between 3.10 and 3.12, inclusive of both 3.10 and 3.12.
Follow instructions in [sat_demo](sat/README.md): Contains the inference code and fine-tuning code of SAT weights. It is recommended to improve based on the CogVideoX model structure. Innovative researchers use this code to better perform rapid stacking and development.
Diffusers
Please make sure your Python version is between 3.10 and 3.12, inclusive of both 3.10 and 3.12.
pip install -r requirements.txt
Then follow [diffusers_demo](inference/cli_demo.py): A more detailed explanation of the inference code, mentioning the significance of common parameters.
For more details on quantized inference, please refer to…
Excerpt shown — open the source for the full document.