MultiProp Framework: Ensemble Models for Enhanced Cross-Lingual Propaganda Detection in Social Media and News using Data Augmentation, Text Segmentation, and Meta-Learning

Aldabbas, Farizeh; Ashraf, Shaina; Sifa, Rafet; Flek, Lucie

2025

Conference Paper

Abstract

Propaganda, a pervasive tool for influenc- ing public opinion, demands robust auto- mated detection systems, particularly for underresourced languages. Current efforts largely focus on well-resourced languages like English, leaving significant gaps in languages such as Arabic. This research addresses these gaps by introducing MultiProp Framework, a crosslingual meta-learning framework designed to enhance propaganda detection across multiple languages, including Arabic, German, Italian, French and English. We constructed a multilingual dataset using data translation techniques, beginning with Arabic data from PTC and WANLP shared tasks, and expanded it with translations into German Italian and French, further enriched by the SemEval23 dataset. Our proposed framework encompasses three distinct models: MultiProp-Baseline, which combines ensembles of pre-trained models such as GPT-2, mBART, and XLM-RoBERTa; MultiProp-ML, designed to handle languages with minimal or no training data by utilizing advanced meta-learning techniques; and MultiProp-Chunk, which overcomes the challenges of processing longer texts that exceed the token limits of pretrained models. Together, they deliver superior performance compared to state-of-the-art methods, representing a significant advancement in the field of crosslingual propaganda detection.

Author(s)

Aldabbas, Farizeh

Universität Bonn

Ashraf, Shaina

Universität Bonn

Sifa, Rafet

Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Flek, Lucie

Universität Bonn

Mainwork

AbjadNLP 2025, The 1st Workshop on NLP for Languages Using Arabic Script. Proceedings of the Workshop

Conference

Workshop on NLP for Languages Using Arabic Script 2025

International Conference on Computational Linguistics 2025

Options

MultiProp Framework: Ensemble Models for Enhanced Cross-Lingual Propaganda Detection in Social Media and News using Data Augmentation, Text Segmentation, and Meta-Learning