Options
June 30, 2025
Conference Paper
Title
Calibrating POI-based Synthetic Speech Detection
Abstract
Recent advances in deep learning have yielded increasingly sophisticated speech generation systems (Text-To-Speech or Voice Conversion algorithms) capable of producing realistic synthetic speech material that is often indistinguishable from human voices. Although these technologies support a wide range of legitimate applications, they also facilitate malicious uses, including impersonation and misinformation, thereby posing significant societal threats. As a result, synthetic speech detection has emerged as an urgent research focus. Despite numerous proposed methods, a persistent generalization problem remains: detectors struggle to classify out-of-domain samples unseen during training, hence adapting and staying consistent when facing real-world scenarios.
We tackle this limitation with a Person-of-Interest framework that exploits speaker-specific characteristics for Synthetic Speech Detection, thereby enhancing generalizability across diverse generators. Specifically, we introduce an ensemble approach that addresses a previously unstudied calibration problem: the system uses only recording-level statistics to self-calibrate, leveraging the abstraction capabilities of a large-scale, pre-trained audio model. Experiments demonstrate that our method achieves strong performance, high generalizability, and robustness across various datasets.
We tackle this limitation with a Person-of-Interest framework that exploits speaker-specific characteristics for Synthetic Speech Detection, thereby enhancing generalizability across diverse generators. Specifically, we introduce an ensemble approach that addresses a previously unstudied calibration problem: the system uses only recording-level statistics to self-calibrate, leveraging the abstraction capabilities of a large-scale, pre-trained audio model. Experiments demonstrate that our method achieves strong performance, high generalizability, and robustness across various datasets.
Open Access
File(s)
Rights
CC BY 4.0: Creative Commons Attribution
Additional link
Language
English
Keyword(s)