Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Token level code-switching detection using Wikipedia as a lexical resource

: Claeser, Daniel; Felske, Dennis; Kent, Samantha

Fulltext (PDF; )

Rehm, G.:
Language Technologies for the Challenges of the Digital Age. 27th International Conference, GSCL 2017 : Berlin, Germany, September 13-14, 2017, Proceedings
Cham: Springer International Publishing, 2017 (Lecture Notes in Computer Science 10713)
ISBN: 978-3-319-73705-8 (Print)
ISBN: 978-3-319-73706-5 (Online)
ISBN: 3-319-73705-8
International Conference on Language Technologies for the Challenges of the Digital Age (GSCL) <27, 2017, Berlin>
Conference Paper, Electronic Publication
Fraunhofer FKIE ()

We present a novel lexicon-based classification approach for code-switching detection on Twitter. The main aim is to develop a simple lexical look-up classifier based on frequency information retrieved from Wikipedia. We evaluate the classifier using three different language pairs: Spanish-English, Dutch-English, and German-Turkish. The results indicate that our figures for Spanish-English are competitive with current state of the art classifiers, even though the approach is simplistic and based solely on word frequency information.