MiniSora Community
[![Contributors][contributors-shield]][contributors-url]
[![Forks][forks-shield]][forks-url]
[![Issues][issues-shield]][issues-url]
[![MIT License][license-shield]][license-url]
[![Stargazers][stars-shield]][stars-url]
English | 简体中文
👋 join us on WeChat
The MiniSora open-source community is positioned as a community-driven initiative organized spontaneously by community members. The MiniSora community aims to explore the implementation path and future development direction of Sora.
- Regular round-table discussions will be held with the Sora team and the community to explore possibilities.
- We will delve into existing technological pathways for video generation.
- Leading the replication of papers or research results related to Sora, such as DiT (MiniSora-DiT), etc.
- Conducting a comprehensive review of Sora-related technologies and their implementations, i.e., "From DDPM to Sora: A Review of Video Generation Models Based on Diffusion Models".
Hot News
- Stable Diffusion 3: MM-DiT: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
- MiniSora-DiT: Reproducing the DiT Paper with XTuner
- Introduction of MiniSora and Latest Progress in Replicating Sora
Reproduction Group of MiniSora Community
Sora Reproduction Goals of MiniSora
- GPU-Friendly: Ideally, it should have low requirements for GPU memory size and the number of GPUs, such as being trainable and inferable with compute power like 8 A100 80G cards, 8 A6000 48G cards, or RTX4090 24G.
- Training-Efficiency: It should achieve good results without requiring extensive training time.
- Inference-Efficiency: When generating videos during inference, there is no need for high length or resolution; acceptable parameters include 3-10 seconds in length and 480p resolution.
MiniSora-DiT: Reproducing the DiT Paper with XTuner
https://github.com/mini-sora/minisora-DiT
Requirements
We are recruiting MiniSora Community contributors to reproduce DiT
using XTuner.
We hope the community member has the following characteristics:
- Familiarity with the
OpenMMLab MMEngine
mechanism. - Familiarity with
DiT
.
Background
- The author of
DiT
is the same as the author ofSora
. - XTuner has the core technology to efficiently train sequences of length
1000K
.
Support
Recent round-table Discussions
Paper Interpretation of Stable Diffusion 3 paper: MM-DiT
Speaker: MMagic Core Contributors
Live Streaming Time: 03/12 20:00
Highlights: MMagic core contributors will lead us in interpreting the Stable Diffusion 3 paper, discussing the architecture details and design principles of Stable Diffusion 3.
PPT: FeiShu Link
Highlights from Previous Discussions
Night Talk with Sora: Video Diffusion Overview
ZhiHu Notes: A Survey on Generative Diffusion Model: An Overview of Generative Diffusion Models
Paper Reading Program
-
Technical Report: Video generation models as world simulators
-
Latte: Latte: Latent Diffusion Transformer for Video Generation
-
Stable Cascade (ICLR 24 Paper): Würstchen: An efficient architecture for large-scale text-to-image diffusion models
-
Stable Diffusion 3: MM-DiT: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
-
Updating...
Recruitment of Presenters
Related Work
- 01 Diffusion Model
- 02 Diffusion Transformer
- 03 Baseline Video Generation Models
- 04 Diffusion UNet
- 05 Video Generation
- 06 Dataset
- 6.1 Pubclic Datasets
- 6.2 Video Augmentation Methods
- 6.2.1 Basic Transformations
- 6.2.2 Feature Space
- 6.2.3 GAN-based Augmentation
- 6.2.4 Encoder/Decoder Based
- 6.2.5 Simulation
- 07 Patchifying Methods
- 08 Long-context
- 09 Audio Related Resource
- 10 Consistency
- 11 Prompt Engineering
- 12 Security
- 13 World Model
- 14 Video Compression
- 15 Mamba
- 16 Existing high-quality resources
- 17 Efficient Training
- 17.1 Parallelism based Approach
- 17.1.1 Data Parallelism (DP)
- 17.1.2 Model Parallelism (MP)
- 17.1.3 Pipeline Parallelism (PP)
- 17.1.4 Generalized Parallelism (GP)
- 17.1.5 ZeRO Parallelism (ZP)
- 17.2 Non-parallelism based Approach
- 17.2.1 Reducing Activation Memory
- 17.2.2 CPU-Offloading
- 17.2.3 Memory Efficient Optimizer
- 17.3 Novel Structure
- 17.1 Parallelism based Approach
- 18 Efficient Inference
- 18.1 Reduce Sampling Steps
- 18.1.1 Continuous Steps
- 18.1.2 Fast Sampling
- 18.1.3 Step distillation
- 18.2 Optimizing Inference
- 18.2.1 Low-bit Quantization
- 18.2.2 Parallel/Sparse inference
- 18.1 Reduce Sampling Steps
01 Diffusion Models | |
---|---|
Paper | Link |
1) Guided-Diffusion: Diffusion Models Beat GANs on Image Synthesis | NeurIPS 21 Paper, GitHub |
2) Latent Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models | CVPR 22 Paper, GitHub |
3) EDM: Elucidating the Design Space of Diffusion-Based Generative Models | NeurIPS 22 Paper, GitHub |
4) DDPM: Denoising Diffusion Probabilistic Models | NeurIPS 20 Paper, GitHub |
5) DDIM: Denoising Diffusion Implicit Models | ICLR 21 Paper, GitHub |
6) Score-Based Diffusion: Score-Based Generative Modeling through Stochastic Differential Equations | ICLR 21 Paper, GitHub, Blog |
7) Stable Cascade: Würstchen: An efficient architecture for large-scale text-to-image diffusion models | ICLR 24 Paper, GitHub, Blog |
8) Diffusion Models in Vision: A Survey | TPAMI 23 Paper, GitHub |
9) Improved DDPM: Improved Denoising Diffusion Probabilistic Models | ICML 21 Paper, Github |
10) Classifier-free diffusion guidance | NIPS 21 Paper |
11) Glide: Towards photorealistic image generation and editing with text-guided diffusion models | Paper, Github |
12) VQ-DDM: Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation | CVPR 22 Paper, Github |
13) Diffusion Models for Medical Anomaly Detection | Paper, Github |
14) Generation of Anonymous Chest Radiographs Using Latent Diffusion Models for Training Thoracic Abnormality Classification Systems | Paper |
15) DiffusionDet: Diffusion Model for Object Detection | ICCV 23 Paper, Github |
16) Label-efficient semantic segmentation with diffusion models | ICLR 22 Paper, Github, Project |
02 Diffusion Transformer | |
Paper | Link |
1) UViT: All are Worth Words: A ViT Backbone for Diffusion Models | CVPR 23 Paper, GitHub, ModelScope |
2) DiT: Scalable Diffusion Models with Transformers | ICCV 23 Paper, GitHub, Project, ModelScope |
3) SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers | ArXiv 23, GitHub, ModelScope |
4) FiT: Flexible Vision Transformer for Diffusion Model | ArXiv 24, GitHub |
5) k-diffusion: Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers | ArXiv 24, GitHub |
6) Large-DiT: Large Diffusion Transformer | GitHub |
7) VisionLLaMA: A Unified LLaMA Interface for Vision Tasks | ArXiv 24, GitHub |
8) Stable Diffusion 3: MM-DiT: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis | Paper, Blog |
9) PIXART-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation | ArXiv 24, Project |
10) PIXART-α: Fast Training of Diffusion Transformer for Photorealistic |