Options
2018
Conference Paper
Titel
Cross-domain authorship attribution based on compression
Titel Supplements
Notebook for PAN at CLEF 2018
Abstract
Authorship attribution (AA) is a very well studied research subject and the most prominent subtask of authorship analysis. The goal of AA is to identify the most likely author of an anonymous document among a set of known candidate authors, for which sample documents exist. Even after more than a century of intensive research, AA is still far from being solved. One open question, for example is, if the goal of AA can be successfully achieved, if the anonymous document and the known sample documents come from different domains such as genre or topic. We present a lightweight authorship attribution approach named COBAA ("Compression-Based Authorship Attribution") which is an attempt to answer this question. COBAA is based solely on a compression algorithm as well as a simple similarity measure and does not involve a training procedure. Therefore, the method can be used out-of-the-box even in real-world scenarios, where no training data is available. COBAA has been evaluated at the PAN 2018 Author Identification shared task and was ranked third among 11 participating approaches. The method achieved 0.629 in terms of Mean Macro-F1 on a corpus with attribution problems, distributed across five languages (English, French, Italian, Polish and Spanish).