Options
June 15, 2026
Conference Paper
Title
Detecting Audio-Text Decontextualization through Entailment and Semantic Analysis
Abstract
Audio-text decontextualization is a form of real-world misinformation in which genuine audio recordings - speech excerpts, news clips, interviews - are detached from their authentic context and paired with misleading textual narratives. Addressing it in practice requires both audio provenance analysis and context analysis: provenance retrieves candidate source recordings, while context analysis determines whether the recovered source supports the narrative attached to the post. This paper presents three context-analysis pipelines able to address this issue and their cascade combinations, and evaluates them on the M3A dataset alongside four audio-language baselines. We show that a substantial fraction of M3A manipulations are fundamentally undetectable from audio-text content alone, and that on the subset where detection is possible our best pipelines reach 0.73 accuracy on Named Entity Manipulation (NEM) and 0.92 on Multimodal Misalignment (MM) audio swap. Building on these findings, we formulate an operational workflow for real-world investigations and demonstrate it on three case studies, which also motivate a lightweight linguistic middle layer for conditional and modal/hedging framing drops. This leads to two practical deployment recommendations: (1) a fast bulk-screening pipeline that flags context-stripping attacks via entailment failure; and (2) a large language model (LLM)-based deep-verification pipeline for the most suspicious cases, capable of explicit reasoning about framing shifts.
Open Access
File(s)
Rights
CC BY 4.0: Creative Commons Attribution
Additional link
Language
English
Keyword(s)