DynaDistill: Leveraging Real-Time Feedback for Effective Dataset Distillation

Chen, Zongxiong; Zhu, Derui; Geng, Jiahui; Schimmler, Sonja; Hauswirth, Manfred

doi:10.24406/publica-5823

2024

Conference Paper not in Proceedings

Abstract

Dataset Distillation (DD) aims to compress the knowledge contained in a large-scale dataset into a substantially smaller synthetic dataset. While this synthetic dataset is meticulously crafted to mirror the performance of the original, it poses significant challenges in training efficiency and data utility. This singular focus, especially the selection of a sole expert trajectory in MTT or a single model in IDC, inadvertently undermines the potential performance of the distilled synthetic dataset at certain intervals within the whole distillation process. This inefficiency necessitates a protracted series of training iterations to culminate in an improved performance outcome. In this paper, we hypothesize that there exists an optimized training routine across the entire optimization phase, specifically for synthetic dataset training through gradient or trajectory matching. To address these challenges, this paper introduces a novel methodology, namely DynaDistill, which is designed to expedite the distillation process by dramatically decreasing the required number of distillation steps in current state-of-the-art methods without compromising their performance. Our empirical results demonstrate that our method achieves comparable performance on par with state-of-the-art methods. Moreover, the design of our method allows it to integrate as a plug-andplay module into existing distillation techniques seamlessly.

Author(s)

Chen, Zongxiong

Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS

Zhu, Derui

Geng, Jiahui

Schimmler, Sonja

Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS

Hauswirth, Manfred

Fraunhofer-Institut für Offene Kommunikationssysteme FOKUS

Conference

Conference on Computer Vision and Pattern Recognition Workshops 2024

Options

DynaDistill: Leveraging Real-Time Feedback for Effective Dataset Distillation