ReDoS-M: A Dataset of Multi-Label Regrettable Disclosures on Social Media

Simo Fhom, Hervais-Clemence; Kreutzer, Michael; Nikolov, Javor

doi:10.5220/0014632900004061

2026

Conference Paper

Abstract

Research on automated detection of regrettable disclosures in online social networks (OSN) is limited by the lack of large-scale, semantically rich, and fine-grained annotated datasets. Existing datasets often provide narrow coverage and conflate regret with related phenomena such as toxicity or hate speech, hindering robust modeling of regret-specific cues. To address these limitations, we introduce ReDoS-M, a large-scale, multisource corpus constructed via a hybrid annotation pipeline that combines crowd-sourced labeling, transformerbased self-training, and enrichment with Sentiment-Moral-Emotion (SME) features. Starting from a collection of more than 5.5M user-generated posts and comments gathered from platforms such as Reddit and X (formerly Twitter), we derive four complementary corpora ranging from 4.27M to 5.13M annotated items, reflecting different annotation and label-fusion strategies. We evaluate ReDoS-M in terms of label coverage and downstream utility by training and evaluating six transformer-based models (DeBERTa and XLM-RoBERTa variants, with and without SME and Large Language Model-generated features). Across the ReDoS-M corpora, all six models achieve strong performance, with micro-F1 scores exceeding 0.98 and AUC values above 0.99 in the best settings, demonstrating that ReDoS-M supports effective and generalizable detection of regrettable OSN disclosures. Overall, ReDoS-M constitutes a comprehensive and scalable foundation for advancing research on fine-grained modeling and classification of regrettable disclosures in OSN environments.

Author(s)

Simo Fhom, Hervais-Clemence

Fraunhofer-Institut für Sichere Informationstechnologie SIT

Kreutzer, Michael

Fraunhofer-Institut für Sichere Informationstechnologie SIT

Nikolov, Javor

Fraunhofer-Institut für Sichere Informationstechnologie SIT

Mainwork

ICISSP 2026, 12th International Conference on Information Systems Security and Privacy. Proceedings. Vol.1

Conference

International Conference on Information Systems Security and Privacy 2026

Options

ReDoS-M: A Dataset of Multi-Label Regrettable Disclosures on Social Media