Options
2022
Conference Paper
Title
A Comparative Performance Analysis of Fast K-Means Clustering Algorithms
Abstract
Data clustering is a fundamental and widespread problem in computer science, which has become very attractive in both scientific communities and application domains. Among the different algorithmic methods, the k-means algorithm, and its prominent implementation, the Lloyd algorithm, has developed into a de facto standard for partitioning-based clustering. This algorithm, however, turns out to be inefficient on very large databases. In order to mitigate this efficiency issue, several fast k-means algorithms for ad-hoc and exact data clustering have been proposed in the literature. Since their inner workings and applied pruning criteria differ, it is difficult to predict the efficiency of individual algorithms in certain application scenarios. We thus present a performance analysis of existing fast k-means algorithms. We focus on simple interpretability and comparability and abstract from many implementation details so as to provide a guide for data scientists and practitioners alike.
Author(s)