Options
2025
Conference Paper
Title
Agent-Based Hate Speech Moderation Approach
Abstract
Detecting problematic content heavily relies on the context, and personal details, such as age, language, and nationality often remain inaccessible due to privacy issues. Additionally, platforms face challenges due to diverse local laws regarding online hate speech and must evaluate content according to their own ethical standards. This study introduces a novel agent-based system that adheres to requirements imposed by local privacy and data protection regulations, such as the European General Data Protection Regulation (GDPR), and integrates legal and ethical reasoning into the content moderation process. The system enhances the transparency of moderation decisions by leveraging user information. The research details two key use cases essential for online communication, utilizing technologies like GPT-3.5, Solid Pods, and the Prova rule language. The first use case focuses on protecting adolescents from potentially harmful content by restricting certain posts in the presence of minors. The second use case involves detecting and responding to problematic statements by generating counter-responses tailored using personal attributes. This work sets the stage for future compliance with the Digital Services Act (DSA) by proposing an innovative way to navigate various legal and ethical definitions of hate speech and to formulate appropriate counter-responses. The study discusses the agent-based system including an agent for hate speech detection, the chat platform as an agent, and an agent responsible for local and ethical reasoning in Prova, highlighting the advantages for content moderation and algorithmic detection of hate speech. It also specifies key factors for DSA compliance.
Author(s)
Mainwork
Communications in Computer and Information Science
Conference
1st International Workshop on Causality, Agents and Large Models, CALM 2024