• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. LLM Sensitivity Challenges in Abusive Language Detection: Instruction-Tuned vs. Human Feedback
 
  • Details
  • Full
Options
2025
Conference Paper
Title

LLM Sensitivity Challenges in Abusive Language Detection: Instruction-Tuned vs. Human Feedback

Abstract
The capacity of large language models (LLMs) to understand and distinguish socially unacceptable texts enables them to play a promising role in abusive language detection. However, various factors can affect their sensitivity. In this work, we test whether LLMs have an unintended bias in abusive language detection, i.e., whether they predict more or less of a given abusive class than expected in zero-shot settings. Our results show that instruction-tuned LLMs tend to under-predict positive classes, since datasets used for tuning are dominated by the negative class. On the contrary, models fine-tuned with human feedback tend to be overly sensitive. In an exploratory approach to mitigate these issues, we show that label frequency in the prompt helps with the significant over-prediction.
Author(s)
Zhang, Yaqi
Technische Universität München
Hangya, Viktor
Fraunhofer-Institut für Integrierte Schaltungen IIS  
Fraser, Alexander
Technische Universität München
Mainwork
COLING 2025, the 31st International Conference on Computational Linguistics. Proceedings of the Main Conference  
Conference
International Conference on Computational Linguistics 2025  
Link
Link
Language
English
Fraunhofer-Institut für Integrierte Schaltungen IIS  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024