• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Incorporating code-switching and borrowing in Dutch-English automatic language detection on Twitter
 
  • Details
  • Full
Options
2019
Conference Paper
Title

Incorporating code-switching and borrowing in Dutch-English automatic language detection on Twitter

Abstract
This paper presents a classification system to automatically identify the language of individual tokens in Dutch-English bilingual Tweets. A dictionary-based approach is used as the basis of the system, and additional features are introduced to address the challenges associated with identifying closely related languages. Crucially, a separate system aimed specifically at differentiating between code-switching and borrowing is designed and then implemented as a classification step within the language identification (LID) system. The separate classification step is based on a linguistic framework for distinguishing between borrowing and CS. To test the effectiveness of the rules in the LID system, they are used to create feature vectors for training and testing machine learning systems. The discussion centres are based on a Decision Tree Classifier (DTC) and Support Vector Machines (SVM). The results show that there is only a small difference between the rule-based LID system (micro F1 = .95) and the DTC (micro F1 = .96).
Author(s)
Kent, S.
Claeser, D.
Mainwork
Future Technologies Conference, FTC 2018. Proceedings. Vol.1  
Conference
Future Technologies Conference (FTC) 2018  
DOI
10.1007/978-3-030-02686-8_32
Language
English
Fraunhofer-Institut für Kommunikation, Informationsverarbeitung und Ergonomie FKIE  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024