DeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Language

Karim, Md. RezaulMd. RezaulKarimDey, Sumon KantiSumon KantiDeyIslam, TanhimTanhimIslamSarker, SagorSagorSarkerMenon, Mehadi HasanMehadi HasanMenonHossain, KabirKabirHossainHossain, Md. AzamMd. AzamHossainDecker, StefanStefanDecker2022-08-162022-08-162021https://publica.fraunhofer.de/handle/publica/41979910.1109/DSAA53316.2021.9564230In this paper, we propose an explainable approach for hate speech detection from the under-resourced Bengali language, which we called DeepHateExplainer. In our approach, Bengali texts are first comprehensively preprocessed, before classifying them into political, personal, geopolitical, and religious hates using a neural ensemble method of transformer-based neural architectures (i.e., monolingual Bangla BERT-base, multilingual BERT-cased/uncased, and XLM-RoBERTa). Subsequently, important (most and least) terms are identified using sensitivity analysis and layer-wise relevance propagation (LRP), before providing human-interpretable explanations11To foster reproducible research, we make available the data, source codes, models, and notebooks: https://github.com/rezacsedu/DeepHateExplainer. Finally, we compute comprehensiveness and sufficiency scores to measure the quality of explanations w.r.t faithfulness. Evaluations against machine learning (linear and tree-based models) and neural networks (i.e., CNN, Bi-LSTM, and Conv-LSTM with word embeddings) baselines yield F1-scores of 78%, 91%, 89%, and 84%, for political, personal, geopolitical, and religious hates, respectively, outperforming both ML and DNN baselines22Read an extended version of this paper: https://arxiv.org/abs/2012.14353.enDeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Languageconference paper