Options
June 30, 2025
Conference Paper
Title
Towards Explainable Person-of-Interest-based Audio Synthesis Detection
Abstract
Generalization and explainability are two key challenges in synthetic audio detection. Effective detectors should not only reliably classify unseen data from unknown synthesis algorithms, but also provide insight into their decision-making process and explain why a given input was classified as real or fake. To promote generalization we use the Person-of-Interest approach, which allows us to detect synthetic audio using a model trained only on real data, provided that some pristine audio of the putative speaker is provided. To support explainability, we instead use an encoder-decoder backbone such that the bottleneck features ensure syntactic and semantic fidelity to the input, as well as enable reliable decisions. Experiments show that our approach outperforms both state-of-the-art models based on supervised learning and methods based on speaker verification.
Author(s)