• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Audio Spectrogram Transformer for Synthetic Speech Detection via Speech Formant Analysis
 
  • Details
  • Full
Options
2024
Conference Paper
Title

Audio Spectrogram Transformer for Synthetic Speech Detection via Speech Formant Analysis

Abstract
In this paper, we address the challenge of synthetic speech detection, which has become increasingly important due to the latest advancements in text-to-speech and voice conversion technologies. We propose a novel multi-task neural network architecture, designed to be interpretable and specifically tailored for audio signals. The architecture includes a feature bottle-neck, used to autoencode the input spectrogram, predict the fundamental frequency (f0) trajectory, and classify the speech as synthetic or natural. Hence, the synthesis detection can be considered a byproduct of attending to the energy distribution among vocal formants, providing a clear understanding of which characteristics of the input signal influence the final outcome. Our evaluation on the ASVspoof 2019 LA partition indicates better performance than the current state of the art, with an AUC score of 0.900.
Author(s)
Cuccovillo, Luca  
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Gerhardt, Milica  
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Aichroth, Patrick  
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Mainwork
IEEE Workshop on Information Forensics and Security, WIFS 2023  
Conference
International Workshop on Information Forensics and Security 2023  
DOI
10.1109/WIFS58808.2023.10374615
Language
English
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Keyword(s)
  • Audio forensics

  • Media forensics

  • Audio deepfakes

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024