Dataset Condensation is a newly emerging technique aiming at learning a tiny dataset that captures the rich information encoded in the original dataset. As the size of datasets contemporary machine learning models rely on becomes increasingly large, condensation methods become a prominent direction for accelerating network training and reducing data storage. Despite numerous methods have been proposed in this rapidly growing field, evaluating and comparing different condensation methods is non-trivial and still remains an open issue.
This work provides the first large-scale standardized benchmark on Dataset Condensation. It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods through the lens of their generated dataset. The benchmark library, including evaluators, baseline methods, and generated datasets, is open-sourced at DCBench github.
Comprehensive Datasets
Automated Eval Library
NAS
CIFAR10 IPC 1
CIFAR10 IPC 10
CIFAR10 IPC 50
CIFAR100 IPC 1
CIFAR100 IPC 10
CIFAR100 IPC 50
TinyImagenet IPC 1
TinyImagenet IPC 10
TinyImagenet IPC 50
Imagenet IPC 1
Imagenet IPC 2
Imagenet IPC 10
Imagenet IPC 50
Citation
@article{cui2022dc, title={DC-BENCH: Dataset Condensation Benchmark}, author={Cui, Justin and Wang, Ruochen and Si, Si and Hsieh, Cho-Jui}, journal={arXiv preprint arXiv:2207.09639}, year={2022} }