Scalable, Context-Aware NLP Moderation for Child Safety: A Multi-Agent Ethical and Legal Compliance Framework

Fillies, Jan; Mitsikas, Theodoros; Schäfermeier, Ralph; Paschke, Adrian

doi:10.24406/publica-6685

2025

Conference Paper

Abstract

Protecting children from harmful online content requires systems that are accurate, adaptive, and legally compliant across jurisdictions. This paper presents a hybrid, rule-based, multi-agent moderation architecture designed to detect and mitigate toxic speech in real time while ensuring adherence to diverse legal and ethical standards. The system employs large language models, including Google Gemini, GPT-4o-nano, and GPT-4o, to classify user messages according to a detailed hate speech taxonomy. In addition to use case specifically defined ethical rules the approach dynamically identifies the applicable legal frameworks (e.g., COPPA, GDPR, DSA) based on the participants’ country of origin and uses LLM-driven agents to generate relevant legal obligations as executable rules in the Prolog rule language. This is the base for a legal and ethical reasoning agent. Moderation decisions are thus context-sensitive, policy-aligned, and legally grounded. System performance was evaluated on a human-annotated dataset of illegal hate speech, demonstrating its effectiveness in identifying content that violates legal definitions. By integrating unsupervised classification with symbolic rule-based reasoning, the system offers a scalable, reliable solution for protecting children and others in online communication environments.