DeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Language

Karim, Md. Rezaul; Dey, Sumon Kanti; Islam, Tanhim; Sarker, Sagor; Menon, Mehadi Hasan; Hossain, Kabir; Hossain, Md. Azam; Decker, Stefan

doi:10.1109/DSAA53316.2021.9564230

2021

Conference Paper

Abstract

In this paper, we propose an explainable approach for hate speech detection from the under-resourced Bengali language, which we called DeepHateExplainer. In our approach, Bengali texts are first comprehensively preprocessed, before classifying them into political, personal, geopolitical, and religious hates using a neural ensemble method of transformer-based neural architectures (i.e., monolingual Bangla BERT-base, multilingual BERT-cased/uncased, and XLM-RoBERTa). Subsequently, important (most and least) terms are identified using sensitivity analysis and layer-wise relevance propagation (LRP), before providing human-interpretable explanations11To foster reproducible research, we make available the data, source codes, models, and notebooks: https://github.com/rezacsedu/DeepHateExplainer. Finally, we compute comprehensiveness and sufficiency scores to measure the quality of explanations w.r.t faithfulness. Evaluations against machine learning (linear and tree-based models) and neural networks (i.e., CNN, Bi-LSTM, and Conv-LSTM with word embeddings) baselines yield F1-scores of 78%, 91%, 89%, and 84%, for political, personal, geopolitical, and religious hates, respectively, outperforming both ML and DNN baselines22Read an extended version of this paper: https://arxiv.org/abs/2012.14353.