Awesome Dataset Distillation
Awesome Dataset Distillation provides the most comprehensive and detailed information on the Dataset Distillation field.
Dataset distillation is the task of synthesizing a small dataset such that models trained on it achieve high performance on the original large dataset. A dataset distillation algorithm takes as input a large real dataset to be distilled (training set), and outputs a small synthetic distilled dataset, which is evaluated via testing models trained on this distilled dataset on a separate real dataset (validation/test set). A good small distilled dataset is not only useful in dataset understanding, but has various applications (e.g., continual learning, privacy, neural architecture search, etc.). This task was first introduced in the paper Dataset Distillation [Tongzhou Wang et al., '18], along with a proposed algorithm using backpropagation through optimization steps. Then the task was first extended to the real-world datasets in the paper Medical Dataset Distillation [Guang Li et al., '19], which also explored the privacy preservation possibilities of dataset distillation. In the paper Dataset Condensation [Bo Zhao et al., '20], gradient matching was first introduced and greatly promoted the development of the dataset distillation field.
In recent years (2022-now), dataset distillation has gained increasing attention in the research community, across many institutes and labs. More papers are now being published each year. These wonderful researches have been constantly improving dataset distillation and exploring its various variants and applications.
This project is curated and maintained by Guang Li, Bo Zhao, and Tongzhou Wang.
How to submit a pull request?
- :globe_with_meridians: Project Page
- :octocat: Code
- :book:
bibtex
Latest Updates
- [Call for papers] The First Dataset Distillation Challenge (Kai Wang & Ahmad Sajedi et al., ECCV 2024) :globe_with_meridians: :octocat:
- [2024/08/07] Prioritize Alignment in Dataset Distillation (Zekai Li & Ziyao Guo et al., 2024) :octocat: :book:
- [2024/08/02] Dataset Distillation for Offline Reinforcement Learning (Jonathan Light & Yuanzhe Liu et al., ICML 2024 Workshop) :globe_with_meridians: :octocat: :book:
- [2024/07/29] An Aggregation-Free Federated Learning for Tackling Data Heterogeneity (Yuan Wang et al., CVPR 2024) :book:
- [2024/07/25] Dataset Distillation in Medical Imaging: A Feasibility Study (Muyang Li et al., 2024) :book:
- [2024/07/25] A Theoretical Study of Dataset Distillation (Zachary Izzo et al., NeurIPS 2023 Workshop) :book:
- [2024/07/23] Dataset Distillation by Automatic Training Trajectories (Dai Liu et al., ECCV 2024) :octocat: :book:
- [2024/07/16] FYI: Flip Your Images for Dataset Distillation (Byunggwan Son et al., ECCV 2024) :globe_with_meridians: :book:
- [2024/07/13] Differentially Private Dataset Condensation (Zheng et al., NDSS 2024 Workshop) :book:
- [2024/07/11] Dataset Quantization with Active Learning based Adaptive Sampling (Zhenghao Zhao et al., ECCV 2024) :octocat: :book:
Contents
- Main
- Early Work
- Gradient/Trajectory Matching Surrogate Objective
- Distribution/Feature Matching Surrogate Objective
- Better Optimization
- Better Understanding
- Distilled Dataset Parametrization
- Generative Prior
- Label Distillation
- Dataset Quantization
- Multimodal Distillation
- Self-Supervied Distillation
- Benchmark
- Survey
- Ph.D. Thesis
- Workshop
- Challenge
- Applications
- Continual Learning
- Privacy
- Medical
- Federated Learning
- Graph Neural Network
- Neural Architecture Search
- Fashion, Art, and Design
- Knowledge Distillation
- Recommender Systems
- Blackbox Optimization
- Trustworthy
- Text
- Tabular
- Retrieval
- Video
- Domain Adaptation
- Super Resolution
- Time Series
- Speech
- Machine Unlearning
- Reinforcement Learning
Main
- Dataset Distillation (Tongzhou Wang et al., 2018) :globe_with_meridians: :octocat: :book:
Early Work
- Gradient-Based Hyperparameter Optimization Through Reversible Learning (Dougal Maclaurin et al., ICML 2015) :octocat: :book:
Gradient/Trajectory Matching Surrogate Objective
- Dataset Condensation with Gradient Matching (Bo Zhao et al., ICLR 2021) :octocat: :book:
- Dataset Condensation with Differentiable Siamese Augmentation (Bo Zhao et al., ICML 2021) :octocat: :book:
- Dataset Distillation by Matching Training Trajectories (George Cazenavette et al., CVPR 2022) :globe_with_meridians: :octocat: :book:
- Dataset Condensation with Contrastive Signals (Saehyung Lee et al., ICML 2022) :octocat: :book:
- Loss-Curvature Matching for Dataset Selection and Condensation (Seungjae Shin & Heesun Bae et al., AISTATS 2023) :octocat: :book:
- Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation (Jiawei Du & Yidi Jiang et al., CVPR 2023) :octocat: :book:
- Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory (Justin Cui et al., ICML 2023) :octocat: :book:
- Sequential Subset Matching for Dataset Distillation (Jiawei Du et al., NeurIPS 2023) :octocat: :book:
- Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching (Ziyao Guo & Kai Wang et al., ICLR 2024) :globe_with_meridians: :octocat: :book:
- SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching (Yongmin Lee et al., ICML 2024) :octocat: :book:
- Dataset Distillation by Automatic Training Trajectories (Dai Liu et al., ECCV 2024) :octocat: :book:
- Prioritize Alignment in Dataset Distillation (Zekai Li & Ziyao Guo et al., 2024) :octocat: :book:
Distribution/Feature Matching Surrogate Objective
- CAFE: Learning to Condense Dataset by Aligning Features (Kai Wang & Bo Zhao et al., CVPR 2022) :octocat: :book:
- Dataset Condensation with Distribution Matching (Bo Zhao et al., WACV 2023) :octocat: :book:
- Improved Distribution Matching for Dataset Condensation (Ganlong Zhao et al., CVPR 2023) :octocat: :book:
- DataDAM: Efficient Dataset Distillation with Attention Matching (Ahmad Sajedi & Samir Khaki et al., ICCV 2023) :globe_with_meridians: :octocat: :book:
- Dataset Distillation via the Wasserstein Metric (Haoyang Liu et al., 2023) :book:
- M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy (Hansong Zhang & Shikun Li et al., AAAI 2024) :octocat: :book:
- On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm (Peng Sun et al., CVPR 2024) :octocat: :book:
- Exploiting Inter-sample and Inter-feature Relations in Dataset Distillation (Wenxiao Deng et al., CVPR 2024) :octocat: :book:
- DANCE: Dual-View Distribution Alignment for Dataset Condensation (Hansong Zhang et al., IJCAI 2024) :octocat: :book:
Better Optimization
- Dataset Meta-Learning from Kernel Ridge-Regression (Timothy Nguyen et al., ICLR 2021) :octocat: :book:
- Dataset Distillation with Infinitely Wide Convolutional Networks (Timothy Nguyen et al., NeurIPS 2021) :octocat: :book:
- Dataset Distillation using Neural Feature Regression (Yongchao Zhou et al., NeurIPS 2022) :globe_with_meridians: :octocat: :book:
- [Efficient Dataset Distillation using Random Feature