A Collection of Video Generation Studies

This GitHub repository summarizes papers and resources related to the video generation task.

If you have any suggestions about this repository, please feel free to start a new issue or pull requests.

Recent news of this GitHub repo are listed as follows.

🔥 Click to see more information.

[Jun. 17th] All NeurIPS 2023 papers and references are updated.
[Apr. 26th] Update a new direction: Personalized Video Generation.
[Mar. 28th] The official AAAI 2024 paper list are released! Official version of PDFs and BibTeX references are updated accordingly.

To-Do Lists
Products
Papers
Datasets
Q&A
References
Star History

To-Do Lists

Latest Papers
- Update ECCV 2024 Papers
- Update CVPR 2024 Papers
  - Update PDFs and References of ⚠️ Papers
  - Update Published Versions of References
- Update AAAI 2024 Papers
  - Update PDFs and References of ⚠️ Papers
  - Update Published Versions of References
- Update ICLR 2024 Papers
- Update NeurIPS 2023 Papers
Previously Published Papers
- Update Previous CVPR papers
- Update Previous ICCV papers
- Update Previous ECCV papers
- Update Previous NeurIPS papers
- Update Previous ICLR papers
- Update Previous AAAI papers
- Update Previous ACM MM papers
Regular Maintenance of Preprint arXiv Papers and Missed Papers

<🎯Back to Top>

Products

Name	Organization	Year	Research Paper	Website	Specialties
Sora	OpenAI	2024	link	link	-
Lumiere	Google	2024	link	link	-
VideoPoet	Google	2023	-	link	-
W.A.I.T	Google	2023	link	link	-
Gen-2	Runaway	2023	-	link	-
Gen-1	Runaway	2023	-	link	-
Animate Anyone	Alibaba	2023	link	link	-
Outfit Anyone	Alibaba	2023	-	link	-
Stable Video	StabilityAI	2023	link	link	-
Pixeling	HiDream.ai	2023	-	link	-
DomoAI	DomoAI	2023	-	link	-
Emu	Meta	2023	link	link	-
Genmo	Genmo	2023	-	link	-
NeverEnds	NeverEnds	2023	-	link	-
Moonvalley	Moonvalley	2023	-	link	-
Morph Studio	Morph	2023	-	link	-
Pika	Pika	2023	-	link	-
PixelDance	ByteDance	2023	link	link	-

<🎯Back to Top>

Papers

Survey Papers

Year 2024
arXiv
- Video Diffusion Models: A Survey [Paper]
Year 2023
arXiv
- A Survey on Video Diffusion Models [Paper]

Text-to-Video Generation

Year 2024
- CVPR
  - Vlogger: Make Your Dream A Vlog [Paper] [Code]
  - Make Pixels Dance: High-Dynamic Video Generation [Paper] [Project] [Demo]
  - VGen: Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation [Paper] [Code] [Project]
  - GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation [Paper] [Project]
  - SimDA: Simple Diffusion Adapter for Efficient Video Generation [Paper] [Code] [Project]
  - MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation [Paper] [Project] [Video]
  - Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models [Paper] [Project]
  - PEEKABOO: Interactive Video Generation via Masked-Diffusion [Paper] [Code] [Project] [Demo]
  - EvalCrafter: Benchmarking and Evaluating Large Video Generation Models [Paper] [Code] [Project]
  - A Recipe for Scaling up Text-to-Video Generation with Text-free Videos [Paper] [Code] [Project]
  - BIVDiff: A Training-free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models [Paper] [Project]
  - Mind the Time: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis [Paper] [Project]
  - Animate Anyone: Consistent and Controllable Image-to-video Synthesis for Character Animation [Paper] [Code] [Project]
  - MotionDirector: Motion Customization of Text-to-Video Diffusion Models [Paper] [Code]
  - Hierarchical Patch-wise Diffusion Models for High-Resolution Video Generation [Paper] [Project]
  - DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation [Paper] [Code]
  - Grid Diffusion Models for Text-to-Video Generation [Paper] [Code] [Video]
- ICLR
  - VDT: General-purpose Video Diffusion Transformers via Mask Modeling [Paper] [Code] [Project]
  - VersVideo: Leveraging Enhanced Temporal Diffusion Models for Versatile Video Generation [Paper]
- AAAI
  - Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos [Paper] [Code] [Project]
  - E2HQV: High-Quality Video Generation from Event Camera via Theory-Inspired Model-Aided Deep Learning [Paper]
  - ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation [Paper] [Code] [Project]
  - F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text to-Video Synthesis [Paper]
- arXiv
  - Lumiere: A Space-Time Diffusion Model for Video Generation [Paper] [Project]
  - Boximator: Generating Rich and Controllable Motions for Video Synthesis [Paper] [Project] [Video]
  - World Model on Million-Length Video And Language With RingAttention [Paper] [Code] [Project]
  - Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion [Paper] [Project]
  - WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens [Paper] [Code] [Project]
  - MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation [Paper] [Project]
  - Latte: Latent Diffusion Transformer for Video Generation [Paper] [Code] [Project]
  - Mora: Enabling Generalist Video Generation via A Multi-Agent Framework [Paper] [Code]
  - StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text [Paper] [Code] [Project] [Video]
  - VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models [Paper]
  - StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation [Paper] [Code] [Project] [Demo]
  - Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model [Paper] [Code] [Project]
- Others
  - Sora: Video Generation Models as World Simulators [Paper]
Year 2023
- CVPR
  - Align your Latents: High-resolution Video Synthesis with Latent Diffusion Models [Paper] [Project] [Reproduced code]
  - Text2Video-Zero: Text-to-image Diffusion Models are Zero-shot Video Generators

awesome-video-generation