Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Bot and Gender Identification in Twitter using Word and Character N-Grams

Notebook for PAN at CLEF 2019
 
: Vogel, Inna; Jiang, Peter

:
Fulltext (PDF; )

Cappellato, L.:
CLEF 2019, Conference and Labs of the Evaluation Forum. Working Notes. Online resource : Lugano, Switzerland, September 9-12, 2019
Lugano, 2019 (CEUR Workshop Proceedings 2380)
http://ceur-ws.org/Vol-2380/
Paper 65, 9 pp.
Conference and Labs of the Evaluation Forum (CLEF) <2019, Lugano>
Bundesministerium für Bildung und Forschung BMBF (Deutschland)

Desinformation im Internet aufdecken und bekämpfen
English
Conference Paper, Electronic Publication
Fraunhofer SIT ()

Abstract
Automated social media accounts, called bots, gained worldwide considerable importance over the course of the last years. Social bots can have serious implications on our society by swaying political elections or spreading disinformation - giving rationale to social bot detection as an emerging research area. Hence, tools and techniques to automatically detect and classify manipulative bots are needed. In this notebook, we describe our system for the author profiling task at PAN 2019 on bot and gender identification on Twitter. The submitted system uses word unigrams and bigrams as well as character n-grams as features. Tweet preprocessing and feature construction were conducted to train a linear Support Vector Machine (SVM) classifier. Our model shows that it is possible to differentiate bots from humans with a (fairly) high accuracy. Additionally, the accuracy shows that our SVM architecture can solidly determine the gender of the author (male or female). Our submitted model achieved an overall accuracy of 0.92 for bot detection on the English dataset and an accuracy of 0.91 for Spanish tweets. Gender can be determined by the accuracy of 0.82 and 0.78 on the English and Spanish corpus, respectively. Our simple model ranked 8th out of 55 competitors.

: http://publica.fraunhofer.de/documents/N-569801.html