• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Data-driven identification of idioms in song lyrics
 
  • Details
  • Full
Options
2021
Conference Paper
Title

Data-driven identification of idioms in song lyrics

Abstract
The automatic recognition of idioms poses a challenging problem for NLP applications. Whereas native speakers can intuitively handle multiword expressions whose compositional meanings are hard to trace back to individual word semantics, there is still ample scope for improvement regarding computational approaches. We assume that idiomatic constructions can be characterized by gradual intensities of semantic non-compositionality, formal fixedness, and unusual usage context, and introduce a number of measures for these characteristics, comprising count-based and predictive collocation measures together with measures of context (un)similarity. We evaluate our approach on a manually labelled gold standard, derived from a corpus of German pop lyrics. To this end, we apply a Random Forest classifier to analyze the individual contribution of features for automatically detecting idioms, and study the trade-off between recall and precision. Finally, we evaluate the classifier on an independent dataset of idioms extracted from a list of Wikipedia idioms, achieving state-of-the art accuracy.
Author(s)
Amin, Miriam
Fraunhofer-Zentrum für Internationales Management und Wissensökonomie IMW  
Fankhauser, Peter
Kupietz, Marc
Schneider, Roman
Mainwork
17th Workshop on Multiword Expressions, MWE 2021. Proceedings of the workshop  
Conference
Workshop on Multiword Expressions (MWE) 2021  
Link
Link
Language
English
Fraunhofer-Zentrum für Internationales Management und Wissensökonomie IMW  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024