Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

VEBAV - a simple, scalable and fast authorship verification scheme

Notebook for PAN at CLEF 2014
 
: Halvani, Oren; Steinebach, Martin

:
Fulltext (PDF; )

Cappellato, L.:
CLEF 2014. Working Notes. Online resource : Sheffield, UK, September 15-18, 2014
Sheffield, 2014 (CEUR Workshop Proceedings 1180)
http://ceur-ws.org/Vol-1180/
pp.1049-1062
Conference and Labs of the Evaluation Forum (CLEF) <2014, Sheffield>
English
Conference Paper, Electronic Publication
Fraunhofer SIT ()
authorship verification; one-class-classification

Abstract
We present VEBAV - a simple, scalable and fast authorship verification scheme for the Author Identification (AI) task within the PAN-2014 competition. VEBAV (VEctor- Based Authorship Verifier), which is a modification of our existing PAN-2013 approach, is an intrinsic one-class-verification method, based on a simple distance function. VEBAV provides a number of benefits as for instance the independence of linguistic resources and tools like ontologies, thesauruses, language models, dictionaries, spellcheckers, etc. Another benefit is the low run- time of the method, due to the fact that deep linguistic processing techniques like POS-tagging, chunking or parsing are not taken into account. A further benefit of VEBAV is the ability to handle more as only one language. More concretely, it can be applied on documents written in Indo-European languages such as Dutch, English, Greek or Spanish. Regarding its configuration VEBAV can be extended or modified easily by replacing its underlying components. These include, for in- stance the distance function (required for classification), the acceptance criterion, the underlying features including their parameters and many more. In our experiments we achieved regarding a 20%-split of the PAN 2014 AI-training-corpus an overall accuracy score of 65,83% (in detail: 80% for Dutch-Essays, 55% for Dutch-Reviews, 55% for English-Essays, 80% English-Novels, 70% for Greek- Articles and 55% for Spanish-Articles).

: http://publica.fraunhofer.de/documents/N-375112.html