Project Icon

VGen

多功能开源视频生成工具库

VGen是一个功能丰富的开源视频生成工具库。它整合了多个先进的视频生成模型,可根据文本、图像、动作和主体等输入创建高质量视频。VGen提供可视化、采样、训练和推理等实用工具,支持图像到视频、文本到视频等多种任务。该项目具有良好的扩展性和完整性,由阿里巴巴集团通义实验室开发。

VGen

figure1

VGen is an open-source video synthesis codebase developed by the Tongyi Lab of Alibaba Group, featuring state-of-the-art video generative models. This repository includes implementations of the following methods:

VGen can produce high-quality videos from the input text, images, desired motion, desired subjects, and even the feedback signals provided. It also offers a variety of commonly used video generation tools such as visualization, sampling, training, inference, join training using images and videos, acceleration, and more.

Open in Spaces Paper page Open in Spaces YouTube Replicate

🔥News!!!

  • [2024.06] We release the code and models of InstructVideo. InstructVideo enables the LoRA fine-tuning and inference in VGen. Feel free to use LoRA fine-tuning for other tasks.
  • [2024.04] We release the models of DreamVideo and ModelScopeT2V V1.5!!! ModelScopeT2V V1.5 is further fine-tuned on ModelScopeT2V for 365k iterations with more data.
  • [2024.04] We release the code and models of TF-T2V!
  • [2024.04] We release the code and models of VideoLCM!
  • [2024.03] We release the training and inference code of DreamVideo!
  • [2024.03] We release the code and model of HiGen!!
  • [2024.01] The gradio demo of I2VGen-XL has been completed in HuggingFace, thanks to our colleague @Wenmeng Zhou and @AK for the support, and welcome to try it out.
  • [2024.01] We support running the gradio app locally, thanks to our colleague @Wenmeng Zhou for the support and @AK for the suggestion, and welcome to have a try.
  • [2024.01] Thanks @Chenxi for supporting the running of i2vgen-xl on Replicate. Feel free to give it a try.
  • [2024.01] The gradio demo of I2VGen-XL has been completed in Modelscope, and welcome to try it out.
  • [2023.12] We have open-sourced the code and models for DreamTalk, which can produce high-quality talking head videos across diverse speaking styles using diffusion models.
  • [2023.12] We release TF-T2V that can scale up existing video generation techniques using text-free videos, significantly enhancing the performance of both Modelscope-T2V and VideoComposer at the same time.
  • [2023.12] We updated the codebase to support higher versions of xformer (0.0.22), torch2.0+, and removed the dependency on flash_attn.
  • [2023.12] We release InstructVideo that can accept human feedback signals to improve VLDM
  • [2023.12] We release the diffusion based expressive talking head generation DreamTalk
  • [2023.12] We release the high-efficiency video generation method VideoLCM
  • [2023.12] We release the code and model of I2VGen-XL and the ModelScope T2V
  • [2023.12] We release the T2V method HiGen and customizing T2V method DreamVideo.
  • [2023.12] We write an introduction document for VGen and compare I2VGen-XL with SVD.
  • [2023.11] We release a high-quality I2VGen-XL model, please refer to the Webpage

TODO

  • Release the technical papers and webpage of I2VGen-XL
  • Release the code and pretrained models that can generate 1280x720 videos
  • Release the code and models of DreamTalk that can generate expressive talking head
  • Release the code and pretrained models of HumanDiff
  • Release models optimized specifically for the human body and faces
  • Updated version can fully maintain the ID and capture large and accurate motions simultaneously
  • Release other methods and the corresponding models

Preparation

The main features of VGen are as follows:

  • Expandability, allowing for easy management of your own experiments.
  • Completeness, encompassing all common components for video generation.
  • Excellent performance, featuring powerful pre-trained models in multiple tasks.

Installation

conda create -n vgen python=3.8
conda activate vgen
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

You also need to ensure that your system has installed the ffmpeg command. If it is not installed, you can install it using the following command:

sudo apt-get update && apt-get install ffmpeg libsm6 libxext6  -y

Datasets

We have provided a demo dataset that includes images and videos, along with their lists in data.

Please note that the demo images used here are for testing purposes and were not included in the training.

Clone the code

git clone https://github.com/ali-vilab/VGen.git
cd VGen

Getting Started with VGen

(1) Train your text-to-video model

Executing the following command to enable distributed training is as easy as that.

python train_net.py --cfg configs/t2v_train.yaml

In the t2v_train.yaml configuration file, you can specify the data, adjust the video-to-image ratio using frame_lens, and validate your ideas with different Diffusion settings, and so on.

  • Before the training, you can download any of our open-source models for initialization. Our codebase supports custom initialization and grad_scale settings, all of which are included in the Pretrain item in yaml file.
  • During the training, you can view the saved models and intermediate inference results in the workspace/experiments/t2v_traindirectory.

After the training is completed, you can perform inference on the model using the following command.

python inference.py --cfg configs/t2v_infer.yaml

Then you can find the videos you generated in the workspace/experiments/test_img_01 directory. For specific configurations such as data, models, seed, etc., please refer to the t2v_infer.yaml file.

If you want to directly load our previously open-sourced Modelscope T2V model, please refer to this link.

(2) Run the I2VGen-XL model

(i) Download model and test data:

!pip install modelscope
from modelscope.hub.snapshot_download import snapshot_download
model_dir = snapshot_download('damo/I2VGen-XL', cache_dir='models/', revision='v1.0.0')

or you can also download it through HuggingFace (https://huggingface.co/damo-vilab/i2vgen-xl):

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
git clone https://huggingface.co/damo-vilab/i2vgen-xl

(ii) Run the following command:

python inference.py --cfg configs/i2vgen_xl_infer.yaml

or you can run:

python inference.py --cfg configs/i2vgen_xl_infer.yaml  test_list_path data/test_list_for_i2vgen.txt test_model models/i2vgen_xl_00854500.pth

The test_list_path represents the input image path and its corresponding caption. Please refer to the specific format and suggestions within demo file data/test_list_for_i2vgen.txt. test_model is the path for loading the model. In a few minutes, you can retrieve the high-definition video you wish to create from the workspace/experiments/test_list_for_i2vgen directory. At present, we find that the current model performs inadequately on anime images and images with a black background due to the lack of relevant training data. We are consistently working to optimize it.

(iii) Run the gradio app locally:

python gradio_app.py

(iv) Run the model on ModelScope and HuggingFace:

Due to the compression of our video quality in GIF format, please click 'HRER' below to view the original video.

Input Image

Click HERE to view the generated video.

Input Image

Click HERE to view the generated video.

Input Image

Click HERE to view the generated video.

项目侧边栏1项目侧边栏2
推荐项目
Project Cover

豆包MarsCode

豆包 MarsCode 是一款革命性的编程助手,通过AI技术提供代码补全、单测生成、代码解释和智能问答等功能,支持100+编程语言,与主流编辑器无缝集成,显著提升开发效率和代码质量。

Project Cover

AI写歌

Suno AI是一个革命性的AI音乐创作平台,能在短短30秒内帮助用户创作出一首完整的歌曲。无论是寻找创作灵感还是需要快速制作音乐,Suno AI都是音乐爱好者和专业人士的理想选择。

Project Cover

有言AI

有言平台提供一站式AIGC视频创作解决方案,通过智能技术简化视频制作流程。无论是企业宣传还是个人分享,有言都能帮助用户快速、轻松地制作出专业级别的视频内容。

Project Cover

Kimi

Kimi AI助手提供多语言对话支持,能够阅读和理解用户上传的文件内容,解析网页信息,并结合搜索结果为用户提供详尽的答案。无论是日常咨询还是专业问题,Kimi都能以友好、专业的方式提供帮助。

Project Cover

阿里绘蛙

绘蛙是阿里巴巴集团推出的革命性AI电商营销平台。利用尖端人工智能技术,为商家提供一键生成商品图和营销文案的服务,显著提升内容创作效率和营销效果。适用于淘宝、天猫等电商平台,让商品第一时间被种草。

Project Cover

吐司

探索Tensor.Art平台的独特AI模型,免费访问各种图像生成与AI训练工具,从Stable Diffusion等基础模型开始,轻松实现创新图像生成。体验前沿的AI技术,推动个人和企业的创新发展。

Project Cover

SubCat字幕猫

SubCat字幕猫APP是一款创新的视频播放器,它将改变您观看视频的方式!SubCat结合了先进的人工智能技术,为您提供即时视频字幕翻译,无论是本地视频还是网络流媒体,让您轻松享受各种语言的内容。

Project Cover

美间AI

美间AI创意设计平台,利用前沿AI技术,为设计师和营销人员提供一站式设计解决方案。从智能海报到3D效果图,再到文案生成,美间让创意设计更简单、更高效。

Project Cover

AIWritePaper论文写作

AIWritePaper论文写作是一站式AI论文写作辅助工具,简化了选题、文献检索至论文撰写的整个过程。通过简单设定,平台可快速生成高质量论文大纲和全文,配合图表、参考文献等一应俱全,同时提供开题报告和答辩PPT等增值服务,保障数据安全,有效提升写作效率和论文质量。

投诉举报邮箱: service@vectorlightyear.com
@2024 懂AI·鲁ICP备2024100362号-6·鲁公网安备37021002001498号