DCBench: Dataset Condensation Benchmark

Dataset Condensation is a newly emerging technique aiming at learning a tiny dataset that captures the rich information encoded in the original dataset. As the size of datasets contemporary machine learning models rely on becomes increasingly large, condensation methods become a prominent direction for accelerating network training and reducing data storage. Despite numerous methods have been proposed in this rapidly growing field, evaluating and comparing different condensation methods is non-trivial and still remains an open issue.

This work provides the first large-scale standardized benchmark on Dataset Condensation. It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods through the lens of their generated dataset. The benchmark library, including evaluators, baseline methods, and generated datasets, is open-sourced at DCBench github.

Comprehensive Datasets

Different datasets are provide ranging from medium to large to better evaluate the condensation methods

Automated Eval Library

We provide a fully auotmated condensation methods performance evaluation library

NAS

We follow the standard Neural Architecture Search proceduare by using condensed dataset for model design

Citation

Consider citing our whitepaper if you want to reference our leaderboard or benchmark.

@article{cui2022dc,
  title={DC-BENCH: Dataset Condensation Benchmark},
  author={Cui, Justin and Wang, Ruochen and Si, Si and Hsieh, Cho-Jui},
  journal={arXiv preprint arXiv:2207.09639},
  year={2022}
}