• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Author clustering using compression-based dissimilarity scores: Notebook for PAN at CLEF 2017
 
  • Details
  • Full
Options
2017
Conference Paper
Title

Author clustering using compression-based dissimilarity scores: Notebook for PAN at CLEF 2017

Abstract
The PAN 2017 Author Clustering task examines the two application scenarios complete author clustering and authorship-link ranking. In the first scenario, one must identify the number (k) of different authors within a document collection and assign each document to exactly one of the k clusters, where each cluster corresponds to a different author. In the second scenario, one must establish authorship links between documents in a cluster and provide a list of document pairs, ranked according to a confidence score. We present a simple scheme to handle both scenarios. In order to group the documents by their authors, we use k-Medoids, where the optimal k is determined through the computation of silhouettes. To determine links between the documents in each cluster, we apply a predefined compressor as well as a dissimilarity measure. The resulting compression-based dissimilarity scores are then used to rank all document pairs. The proposed scheme does not require (text-)preprocessing, feature engineering or hyperparameter optimization, which are often necessary in author clustering and/or other related fields. However, the achieved results indicate that there is room for improvement.
Author(s)
Halvani, O.
Graner, L.
Mainwork
CLEF 2017, Conference and Labs of the Evaluation Forum. Working Notes. Online resource  
Conference
Conference and Labs of the Evaluation Forum (CLEF) 2017  
Link
Link
Language
English
Fraunhofer-Institut für Sichere Informationstechnologie SIT  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024