Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Contraction clustering (raster): A big data algorithm for density-based clustering in constant memory and linear time

: Ulm, G.; Gustavsson, E.; Jirstrand, M.


Nicosia, G.:
Machine learning, optimization, and big data. Third International Conference, MOD 2017 : Volterra, Italy, September 14-17, 2017; Revised selected papers
Cham: Springer International Publishing, 2018 (Lecture Notes in Computer Science 10710)
ISBN: 978-3-319-72925-1 (Print)
ISBN: 978-3-319-72926-8 (Online)
International Workshop on Machine Learning, Optimization, and Big Data (MOD) <3, 2017, Volterra>
Fraunhofer FCC ()

Clustering is an essential data mining tool for analyzing and grouping similar objects. In big data applications, however, many clustering methods are infeasible due to their memory requirements or runtime complexity. Open image in new window (Raster) is a linear-time algorithm for identifying density-based clusters. Its coefficient is negligible as it depends neither on input size nor the number of clusters. Its memory requirements are constant. Consequently, Raster is suitable for big data applications where the size of the data may be huge. It consists of two steps: (1) a contraction step which projects objects onto tiles and (2) an agglomeration step which groups tiles into clusters. Our algorithm is extremely fast. In single-threaded execution on a contemporary workstation, it clusters ten million points in less than 20 s—when using a slow interpreted programming language like Python. Furthermore, Raster is easily parallelizable.