• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. DeTox: A Comprehensive Dataset for German Offensive Language and Conversation Analysis
 
  • Details
  • Full
Options
2022
Conference Paper
Title

DeTox: A Comprehensive Dataset for German Offensive Language and Conversation Analysis

Abstract
In this work, we present a publicly available offensive language dataset (DeTox-dataset) containing 10,278 annotated German social media comments collected in the first half of 2021. With twelve different annotation categories annotated by six annotators, it is far more comprehensive than other datasets, and goes beyond just hate speech detection. The labels aim in particular also at toxicity, criminal relevance and discrimination types of comments. Furthermore, about half of the comments are from coherent parts of conversations, which opens the possibility to consider the comments contexts and do conversation analyses in order to research the contagion of offensive language in conversations. The dataset is available in our GitHub repository: https://github.com/hdaSprachtechnologie/detox
Author(s)
Demus, Christoph
Fraunhofer-Institut für Sichere Informationstechnologie SIT  
Pitz, Jonas
Schütz, Mina
Probol, Nadine
Siegel, Melanie
Labudde, Dirk  
Fraunhofer-Institut für Sichere Informationstechnologie SIT  
Mainwork
WOAH 2022, the Sixth Workshop on Online Abuse and Harms. Proceedings  
Conference
Workshop on Online Abuse and Harms 2022  
Link
Link
Language
English
Fraunhofer-Institut für Sichere Informationstechnologie SIT  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024