An efficient intrinsic authorship verification scheme based on ensemble learning
Authorship Verification is an important sub discipline of digital text forensics. Its goal is to decide, if two texts are written by the same author or not. We present an efficient Authorship Verification scheme based on an ensemble of K-Nearest Neighbor classifiers, where each classifier generates a decision regarding a feature category. Our scheme provides many benefits such as, for instance, the independence of linguistic resources like thesauruses or language models. Furthermore, it can handle different Indo-European languages as for instance English, German, Spanish, Greek, Dutch, Swedish or French. Another benefit is the low runtime, due to the fact that deep linguistic processing (tagging, chunking, parsing, etc.) is not taken into account. Moreover, our scheme can easily be modified for example by replacing the involved distance function, the acceptance criterion or the used features including their parameters. The proposed scheme is evaluated against the publicly available PAN-2013 Author Identification (AI) test corpus, where it was ranked as the second-best in the top ten list, as well as against five other test corpora, compiled by our own. We show in our experiments that it is possible to achieve promising results, even when using a fixed setting of parameters and features across seven different languages.