• English
  • Deutsch
  • Log In
    Password Login
    or
  • Research Outputs
  • Projects
  • Researchers
  • Institutes
  • Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Contraction clustering (raster): A big data algorithm for density-based clustering in constant memory and linear time
 
  • Details
  • Full
Options
2018
Conference Paper
Titel

Contraction clustering (raster): A big data algorithm for density-based clustering in constant memory and linear time

Abstract
Clustering is an essential data mining tool for analyzing and grouping similar objects. In big data applications, however, many clustering methods are infeasible due to their memory requirements or runtime complexity. Open image in new window (Raster) is a linear-time algorithm for identifying density-based clusters. Its coefficient is negligible as it depends neither on input size nor the number of clusters. Its memory requirements are constant. Consequently, Raster is suitable for big data applications where the size of the data may be huge. It consists of two steps: (1) a contraction step which projects objects onto tiles and (2) an agglomeration step which groups tiles into clusters. Our algorithm is extremely fast. In single-threaded execution on a contemporary workstation, it clusters ten million points in less than 20 s-when using a slow interpreted programming language like Python. Furthermore, Raster is easily parallelizable.
Author(s)
Ulm, G.
Gustavsson, E.
Jirstrand, M.
Hauptwerk
Machine learning, optimization, and big data. Third International Conference, MOD 2017
Konferenz
International Workshop on Machine Learning, Optimization, and Big Data (MOD) 2017
Thumbnail Image
DOI
10.1007/978-3-319-72926-8_6
Language
English
google-scholar
FCC
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Send Feedback
© 2022