Detecting Audio-Text Decontextualization through Entailment and Semantic Analysis

Gerhardt, Milica; Cuccovillo, Luca; Aichroth, Patrick

doi:10.1145/3810988.3812666

June 15, 2026

Conference Paper

Abstract

Audio-text decontextualization is a form of real-world misinformation in which genuine audio recordings - speech excerpts, news clips, interviews - are detached from their authentic context and paired with misleading textual narratives. Addressing it in practice requires both audio provenance analysis and context analysis: provenance retrieves candidate source recordings, while context analysis determines whether the recovered source supports the narrative attached to the post. This paper presents three context-analysis pipelines able to address this issue and their cascade combinations, and evaluates them on the M3A dataset alongside four audio-language baselines. We show that a substantial fraction of M3A manipulations are fundamentally undetectable from audio-text content alone, and that on the subset where detection is possible our best pipelines reach 0.73 accuracy on Named Entity Manipulation (NEM) and 0.92 on Multimodal Misalignment (MM) audio swap. Building on these findings, we formulate an operational workflow for real-world investigations and demonstrate it on three case studies, which also motivate a lightweight linguistic middle layer for conditional and modal/hedging framing drops. This leads to two practical deployment recommendations: (1) a fast bulk-screening pipeline that flags context-stripping attacks via entailment failure; and (2) a large language model (LLM)-based deep-verification pipeline for the most suspicious cases, capable of explicit reasoning about framing shifts.

Author(s)

Gerhardt, Milica

Fraunhofer-Institut für Digitale Medientechnologie IDMT

Cuccovillo, Luca

Fraunhofer-Institut für Digitale Medientechnologie IDMT

Aichroth, Patrick

Fraunhofer-Institut für Digitale Medientechnologie IDMT

Mainwork

MAD 2026, 5th ACM International Workshop on Multimedia AI against Disinformation. Proceedings

Conference

International Workshop on Multimedia AI against Disinformation 2026

Options

Detecting Audio-Text Decontextualization through Entailment and Semantic Analysis