An efficient intrinsic authorship verification scheme based on ensemble learning

Halvani, O.; Steinebach, M.

doi:10.1109/ARES.2014.84

2014

Conference Paper

Abstract

Authorship Verification is an important sub discipline of digital text forensics. Its goal is to decide, if two texts are written by the same author or not. We present an efficient Authorship Verification scheme based on an ensemble of K-Nearest Neighbor classifiers, where each classifier generates a decision regarding a feature category. Our scheme provides many benefits such as, for instance, the independence of linguistic resources like thesauruses or language models. Furthermore, it can handle different Indo-European languages as for instance English, German, Spanish, Greek, Dutch, Swedish or French. Another benefit is the low runtime, due to the fact that deep linguistic processing (tagging, chunking, parsing, etc.) is not taken into account. Moreover, our scheme can easily be modified for example by replacing the involved distance function, the acceptance criterion or the used features including their parameters. The proposed scheme is evaluated against the publicly available PAN-2013 Author Identification (AI) test corpus, where it was ranked as the second-best in the top ten list, as well as against five other test corpora, compiled by our own. We show in our experiments that it is possible to achieve promising results, even when using a fixed setting of parameters and features across seven different languages.

Author(s)

Halvani, O.

Steinebach, M.

Mainwork

Ninth International Conference on Availability, Reliability and Security, ARES 2014

Conference

International Conference on Availability, Reliability, and Security (ARES) 2014

International Workshop on Digital Forensics (WSDF) 2014

Options

An efficient intrinsic authorship verification scheme based on ensemble learning