
minisora
致力探索AI视频生成技术的开源社区
MiniSora是一个社区驱动的开源项目,专注于探索AI视频生成技术Sora的实现路径。该项目组织定期圆桌讨论、深入研究视频生成技术、复现相关论文并进行技术回顾。MiniSora旨在开发GPU友好、训练高效、推理快速的AI视频生成方案,推动人工智能视频生成领域的开源发展。
MiniSora Community
<!-- PROJECT SHIELDS -->[![Contributors][contributors-shield]][contributors-url] [![Forks][forks-shield]][forks-url] [![Issues][issues-shield]][issues-url] [![MIT License][license-shield]][license-url] [![Stargazers][stars-shield]][stars-url] <br />
<!-- PROJECT LOGO --> <div align="center"> <img src="assets/logo.jpg" width="600"/> <div> </div> <div align="center"> </div> </div> <div align="center">English | 简体中文
</div> <p align="center"> 👋 join us on <a href="https://cdn.vansin.top/minisora.jpg" target="_blank">WeChat</a> </p>The MiniSora open-source community is positioned as a community-driven initiative organized spontaneously by community members. The MiniSora community aims to explore the implementation path and future development direction of Sora.
- Regular round-table discussions will be held with the Sora team and the community to explore possibilities.
- We will delve into existing technological pathways for video generation.
- Leading the replication of papers or research results related to Sora, such as DiT (MiniSora-DiT), etc.
- Conducting a comprehensive review of Sora-related technologies and their implementations, i.e., "From DDPM to Sora: A Review of Video Generation Models Based on Diffusion Models".
Hot News
- Stable Diffusion 3: MM-DiT: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
- MiniSora-DiT: Reproducing the DiT Paper with XTuner
- Introduction of MiniSora and Latest Progress in Replicating Sora
Reproduction Group of MiniSora Community
Sora Reproduction Goals of MiniSora
- GPU-Friendly: Ideally, it should have low requirements for GPU memory size and the number of GPUs, such as being trainable and inferable with compute power like 8 A100 80G cards, 8 A6000 48G cards, or RTX4090 24G.
- Training-Efficiency: It should achieve good results without requiring extensive training time.
- Inference-Efficiency: When generating videos during inference, there is no need for high length or resolution; acceptable parameters include 3-10 seconds in length and 480p resolution.
MiniSora-DiT: Reproducing the DiT Paper with XTuner
https://github.com/mini-sora/minisora-DiT
Requirements
We are recruiting MiniSora Community contributors to reproduce DiT
using XTuner.
We hope the community member has the following characteristics:
- Familiarity with the
OpenMMLab MMEngine
mechanism. - Familiarity with
DiT
.
Background
- The author of
DiT
is the same as the author ofSora
. - XTuner has the core technology to efficiently train sequences of length
1000K
.
Support
Recent round-table Discussions
Paper Interpretation of Stable Diffusion 3 paper: MM-DiT
Speaker: MMagic Core Contributors
Live Streaming Time: 03/12 20:00
Highlights: MMagic core contributors will lead us in interpreting the Stable Diffusion 3 paper, discussing the architecture details and design principles of Stable Diffusion 3.
PPT: FeiShu Link
<!-- Please scan the QR code with WeChat to book a live video session. <div align="center"> <img src="assets/SD3论文领读.png" width="100"/> <div> </div> <div align="center"> </div> </div> -->Highlights from Previous Discussions
Night Talk with Sora: Video Diffusion Overview
ZhiHu Notes: A Survey on Generative Diffusion Model: An Overview of Generative Diffusion Models
Paper Reading Program
-
Technical Report: Video generation models as world simulators
-
Latte: Latte: Latent Diffusion Transformer for Video Generation
-
Stable Cascade (ICLR 24 Paper): Würstchen: An efficient architecture for large-scale text-to-image diffusion models
-
Stable Diffusion 3: MM-DiT: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
-
Updating...
Recruitment of Presenters
Related Work
- 01 Diffusion Model
- 02 Diffusion Transformer
- 03 Baseline Video Generation Models
- 04 Diffusion UNet
- 05 Video Generation
- 06 Dataset
- 6.1 Pubclic Datasets
- 6.2 Video Augmentation Methods
- 6.2.1 Basic Transformations
- 6.2.2 Feature Space
- 6.2.3 GAN-based Augmentation
- 6.2.4 Encoder/Decoder Based
- 6.2.5 Simulation
- 07 Patchifying Methods
- 08 Long-context
- 09 Audio Related Resource
- 10 Consistency
- 11 Prompt Engineering
- 12 Security
- 13 World Model
- 14 Video Compression
- 15 Mamba
- 16 Existing high-quality resources
- 17 Efficient Training
- 17.1 Parallelism based Approach
- 17.1.1 Data Parallelism (DP)
- 17.1.2 Model Parallelism (MP)
- 17.1.3 Pipeline Parallelism (PP)
- 17.1.4 Generalized Parallelism (GP)
- 17.1.5 ZeRO Parallelism (ZP)
- 17.2 Non-parallelism based Approach
- 17.2.1 Reducing Activation Memory
- 17.2.2 CPU-Offloading
- 17.2.3 Memory Efficient Optimizer
- 17.3 Novel Structure
- 17.1 Parallelism based Approach
- 18 Efficient Inference
- 18.1 Reduce Sampling Steps
- 18.1.1 Continuous Steps
- 18.1.2 Fast Sampling
- 18.1.3 Step distillation
- 18.2 Optimizing Inference
- 18.2.1 Low-bit Quantization
- 18.2.2 Parallel/Sparse inference
- 18.1 Reduce Sampling Steps
<h3 id="diffusion-models">01 Diffusion Models</h3> | |
---|---|
Paper | Link |
1) Guided-Diffusion: Diffusion Models Beat GANs on Image Synthesis | NeurIPS 21 Paper, GitHub |
2) Latent Diffusion: High-Resolution Image Synthesis with Latent Diffusion Models | CVPR 22 Paper, GitHub |
3) EDM: Elucidating the Design Space of Diffusion-Based Generative Models | NeurIPS 22 Paper, GitHub |
4) DDPM: Denoising Diffusion Probabilistic Models | NeurIPS 20 Paper, GitHub |
5) DDIM: Denoising Diffusion Implicit Models | ICLR 21 Paper, GitHub |
6) Score-Based Diffusion: Score-Based Generative Modeling through Stochastic Differential Equations | ICLR 21 Paper, GitHub, Blog |
7) Stable Cascade: Würstchen: An efficient architecture for large-scale text-to-image diffusion models | ICLR 24 Paper, GitHub, Blog |
8) Diffusion Models in Vision: A Survey | TPAMI 23 Paper, GitHub |
9) Improved DDPM: Improved Denoising Diffusion Probabilistic Models | ICML 21 Paper, Github |
10) Classifier-free diffusion guidance | NIPS 21 Paper |
11) Glide: Towards photorealistic image generation and editing with text-guided diffusion models | Paper, Github |
12) VQ-DDM: Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation | CVPR 22 Paper, Github |
13) Diffusion Models for Medical Anomaly Detection | Paper, Github |
14) Generation of Anonymous Chest Radiographs Using Latent Diffusion Models for Training Thoracic Abnormality Classification Systems | Paper |
15) DiffusionDet: Diffusion Model for Object Detection | ICCV 23 Paper, Github |
16) Label-efficient semantic segmentation with diffusion models | ICLR 22 Paper, Github, Project |
<h3 id="diffusion-transformer">02 Diffusion Transformer</h3> | |
Paper | Link |
1) UViT: All are Worth Words: A ViT Backbone for Diffusion Models | CVPR 23 Paper, GitHub, ModelScope |
2) DiT: Scalable Diffusion Models with Transformers | ICCV 23 Paper, GitHub, Project, ModelScope |
3) SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers | ArXiv 23, GitHub, ModelScope |
4) FiT: Flexible Vision Transformer for Diffusion Model | ArXiv 24, GitHub |
5) k-diffusion: Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers | ArXiv 24, GitHub |
6) Large-DiT: Large Diffusion Transformer | GitHub |
7) VisionLLaMA: A Unified LLaMA Interface for Vision Tasks | ArXiv 24, GitHub |
8) Stable Diffusion 3: MM-DiT: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis | Paper, Blog |
9) PIXART-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation | ArXiv 24, Project |
10) PIXART-α: Fast Training of Diffusion Transformer for Photorealistic |
编辑推荐精选


Manus
全面超越基准的 AI Agent助手
Manus 是一款通用人工智能代理平台,能够将您的创意和想法迅速转化为实际成果。无论是定制旅行规划、深入的数据分析,还是教育支持与商业决策,Manus 都能高效整合信息,提供精准解决方案。它以直观的交互体验和领先的技术,为用户开启了一个智慧驱动、轻松高效的新时代,让每个灵感都能得到完美落地。


飞书知识问答
飞书官方推出的AI知识库 上传word pdf即可部署AI私有知识库
基于DeepSeek R1大模型构建的知识管理系统,支持PDF、Word、PPT等常见文档格式解析,实现云端与本地数据的双向同步。系统具备实时网络检索能力,可自动关联外部信息源,通过语义理解技术处理结构化与非结构化数据。免费版本提供基础知识库搭建功能,适用于企业文档管理和个人学习资料整理场景。


Trae
字节跳动发布的AI编程神器IDE
Trae是一种自适应的集成开发环境(IDE),通过自动化和多元协作改变开发流程。利用Trae,团队能够更快速、精确地编写和部署代码,从而提高编程效率和项目交付速度。Trae具备上下文感知和代码自动完成功能,是提升开发效率的理想工具。

酷表ChatExcel
大模型驱动的Excel数据处理工具
基于大模型交互的表格处理系统,允许用户通过对话方式完成数据整理和可视化分析。系统采用机器学习算法解析用户指令,自动执行排序、公式计算和数据透视等操作,支持多种文件格式导入导出。数据处理响应速度保持在0.8秒以内,支持超过100万行数据的即时分析。


DeepEP
DeepSeek开源的专家并行通信优化框架
DeepEP是一个专为大规模分布式计算设计的通信库,重点解决专家并行模式中的通信瓶颈问题。其核心架构采用分层拓扑感知技术,能够自动识别节点间物理连接关系,优化数据传输路径。通过实现动态路由选择与负载均衡机制,系统在千卡级计算集群中维持稳 定的低延迟特性,同时兼容主流深度学习框架的通信接口。


DeepSeek
全球领先开源大模型,高效智能助手
DeepSeek是一家幻方量化创办的专注于通用人工智能的中国科技公司,主攻大模型研发与应用。DeepSeek-R1是开源的推理模型,擅长处理复杂任务且可免费商用。


KnowS
AI医学搜索引擎 整合4000万+实时更新的全球医学文献
医学领域专用搜索引擎整合4000万+实时更新的全球医学文献,通过自主研发AI模型实现精准知识检索。系统每日更新指南、中英文文献及会议资料,搜索准确率较传统工具提升80%,同时将大模型幻觉率控制在8%以下。支持临床建议生成、文献深度解析、学术报告制作等全流程科研辅助,典型用户反馈显示每周可节省医疗工作者70%时间。


Windsurf Wave 3
Windsurf Editor推出第三次重大更新Wave 3
新增模型上下文协议支持与智能编辑功能。本次更新包含五项核心改进:支持接入MCP协议扩展工具生态,Tab键智能跳转提升编码效率,Turbo模式实现自动化终端操作,图片拖拽功能优化多模态交互,以及面向付费用户的个性化图标定制。系统同步集成DeepSeek、Gemini等新模型,并通过信用点数机制实现差异化的资源调配。


腾讯元宝
腾讯自研的混元大模型AI助手
腾讯元宝是腾讯基于自研的混元大模型推出的一款多功能AI应用,旨在通过人工智能技术提升用户在写作、绘画、翻译、编程、搜索、阅读总结等多个领域的工作与生活效率。


Grok3
埃隆·马斯克旗下的人工智能公司 xAI 推出的第三代大规模语言模型
Grok3 是由埃隆·马斯克旗下的人工智能公司 xAI 推出的第三代大规模语言模型,常被马斯克称为“地球上最聪明的 AI”。它不仅是在前代产品 Grok 1 和 Grok 2 基础上的一次飞跃,还在多个关键技术上实现了创新突破。
推荐工具精选
AI云服务特惠
懂AI专属折扣关注微信公众号
最新AI工具、AI资讯
独家AI资源、AI项目落地

微信扫一扫关注公众号