Awesome Dataset Distillation

<img src="https://img.shields.io/badge/Contributions-Welcome-278ea5" alt="Contrib"/> <img src="https://img.shields.io/badge/Number%20of%20Papers-164-FF6F00" alt="PaperNum"/> Stars Forks

Awesome Dataset Distillation provides the most comprehensive and detailed information on the Dataset Distillation field.

Dataset distillation is the task of synthesizing a small dataset such that models trained on it achieve high performance on the original large dataset. A dataset distillation algorithm takes as input a large real dataset to be distilled (training set), and outputs a small synthetic distilled dataset, which is evaluated via testing models trained on this distilled dataset on a separate real dataset (validation/test set). A good small distilled dataset is not only useful in dataset understanding, but has various applications (e.g., continual learning, privacy, neural architecture search, etc.). This task was first introduced in the paper Dataset Distillation [Tongzhou Wang et al., '18], along with a proposed algorithm using backpropagation through optimization steps. Then the task was first extended to the real-world datasets in the paper Medical Dataset Distillation [Guang Li et al., '19], which also explored the privacy preservation possibilities of dataset distillation. In the paper Dataset Condensation [Bo Zhao et al., '20], gradient matching was first introduced and greatly promoted the development of the dataset distillation field.

In recent years (2022-now), dataset distillation has gained increasing attention in the research community, across many institutes and labs. More papers are now being published each year. These wonderful researches have been constantly improving dataset distillation and exploring its various variants and applications.

This project is curated and maintained by Guang Li, Bo Zhao, and Tongzhou Wang.

How to submit a pull request?

:globe_with_meridians: Project Page
:octocat: Code
:book: bibtex

Latest Updates

[Call for papers] The First Dataset Distillation Challenge (Kai Wang & Ahmad Sajedi et al., ECCV 2024) :globe_with_meridians: :octocat:
[2024/08/07] Prioritize Alignment in Dataset Distillation (Zekai Li & Ziyao Guo et al., 2024) :octocat: :book:
[2024/08/02] Dataset Distillation for Offline Reinforcement Learning (Jonathan Light & Yuanzhe Liu et al., ICML 2024 Workshop) :globe_with_meridians: :octocat: :book:
[2024/07/29] An Aggregation-Free Federated Learning for Tackling Data Heterogeneity (Yuan Wang et al., CVPR 2024) :book:
[2024/07/25] Dataset Distillation in Medical Imaging: A Feasibility Study (Muyang Li et al., 2024) :book:
[2024/07/25] A Theoretical Study of Dataset Distillation (Zachary Izzo et al., NeurIPS 2023 Workshop) :book:
[2024/07/23] Dataset Distillation by Automatic Training Trajectories (Dai Liu et al., ECCV 2024) :octocat: :book:
[2024/07/16] FYI: Flip Your Images for Dataset Distillation (Byunggwan Son et al., ECCV 2024) :globe_with_meridians: :book:
[2024/07/13] Differentially Private Dataset Condensation (Zheng et al., NDSS 2024 Workshop) :book:
[2024/07/11] Dataset Quantization with Active Learning based Adaptive Sampling (Zhenghao Zhao et al., ECCV 2024) :octocat: :book:

Main
Applications
- Continual Learning
- Privacy
- Medical
- Federated Learning
- Graph Neural Network
- Neural Architecture Search
- Fashion, Art, and Design
- Knowledge Distillation
- Recommender Systems
- Blackbox Optimization
- Trustworthy
- Text
- Tabular
- Retrieval
- Video
- Domain Adaptation
- Super Resolution
- Time Series
- Speech
- Machine Unlearning
- Reinforcement Learning <a name="main" />

Main

Dataset Distillation (Tongzhou Wang et al., 2018) :globe_with_meridians: :octocat: :book:

Awesome-Dataset-Distillation

Awesome Dataset Distillation

How to submit a pull request?

Latest Updates

Contents

Main

Early Work

Gradient/Trajectory Matching Surrogate Objective

Distribution/Feature Matching Surrogate Objective

Better Optimization

编辑推荐精选

openai-agents-python

Hunyuan3D-2

3FS

TRELLIS

ai-agents-for-beginners

AEE

UI-TARS-desktop

Wan2.1

爱图表

Qwen2.5-VL

探索AI的无限可能

推荐工具精选

豆包MarsCode

豆包

Trae

宣小二

讯飞绘镜

讯飞文书

阿里绘蛙

AI云服务特惠

火山引擎

阿里云

腾讯云

华为云

百度智能云

AWS

关注微信公众号