GDCK: Efficient Large-Scale Graph Distillation Utilizing a Model-Free Kernelized Approach

Zhang, Yue; Chen, Zongxiong; Liu, Qian; Schimmler, Sonja; Hauswirth, Manfred

doi:10.1007/978-981-96-8298-0_19

2025

Conference Paper

Abstract

Large-scale graph distillation, including applications to social networks and literature citation graphs, has shown significant progress in recent years. Existing methods primarily rely on minimizing surrogate objectives, such as gradient or distribution discrepancies between the original and condensed graphs, or aligning training trajectories. However, these approaches are often computationally intensive, requiring nested optimization loops or the training of multiple expert models to guide student models. To overcome these issues, we propose GDCK, a novel and efficient approach for graph distillation based on neural tangent kernels (NTK). GDCK leverages NTKs with kernel ridge regression, eliminating the need to train graph neural networks and significantly reducing computation time. By applying NTKs to randomly selected sub-graphs and within individual classes, GDCK preserves critical structural information for high-performance outcomes. Additionally, it incorporates node importance, effectively compressing nodes whose neighbors exhibit diverse labels into the synthetic graph. Experiments on node classification tasks demonstrate that GDCK achieves rapid convergence in early training epochs, substantially reducing time costs while maintaining competitive classification performance. This approach offers a practical and scalable solution for graph distillation, advancing its utility in real-world scenarios. Our code is available at: https://github.com/SchenbergZY/GDCK.