VEBAV - a simple, scalable and fast authorship verification scheme

Halvani, Oren; Steinebach, Martin

2014

Conference Paper

Abstract

We present VEBAV - a simple, scalable and fast authorship verification scheme for the Author Identification (AI) task within the PAN-2014 competition. VEBAV (VEctor- Based Authorship Verifier), which is a modification of our existing PAN-2013 approach, is an intrinsic one-class-verification method, based on a simple distance function. VEBAV provides a number of benefits as for instance the independence of linguistic resources and tools like ontologies, thesauruses, language models, dictionaries, spellcheckers, etc. Another benefit is the low run- time of the method, due to the fact that deep linguistic processing techniques like POS-tagging, chunking or parsing are not taken into account. A further benefit of VEBAV is the ability to handle more as only one language. More concretely, it can be applied on documents written in Indo-European languages such as Dutch, English, Greek or Spanish. Regarding its configuration VEBAV can be extended or modified easily by replacing its underlying components. These include, for in- stance the distance function (required for classification), the acceptance criterion, the underlying features including their parameters and many more. In our experiments we achieved regarding a 20%-split of the PAN 2014 AI-training-corpus an overall accuracy score of 65,83% (in detail: 80% for Dutch-Essays, 55% for Dutch-Reviews, 55% for English-Essays, 80% English-Novels, 70% for Greek- Articles and 55% for Spanish-Articles).

Author(s)

Halvani, Oren

Steinebach, Martin

Hauptwerk

CLEF 2014. Working Notes. Online resource

Konferenz

Conference and Labs of the Evaluation Forum (CLEF) 2014

Options

VEBAV - a simple, scalable and fast authorship verification scheme