• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. AI Got Your Tongue? Analysing the Sounds of Audio Deepfake Generation Methods
 
  • Details
  • Full
Options
June 30, 2025
Conference Paper
Title

AI Got Your Tongue? Analysing the Sounds of Audio Deepfake Generation Methods

Abstract
In current research, audio deepfake detectors are trained on finding differences between bona-fide and spoofed samples. A variety of generation methods, mostly distinguished in voice conversion(VC) and text-to-speech synthesis (TTS) exist. We assume that these generation methods lead to specific artefacts in the generated recording. To test this, we created a test set with various spoofs containing the same linguistic content and target speakers as a bona-fide counterpart, using four VC and four TTS models. We applied feature representation methods to compare the differences in 1) bona-fide vs. spoofed samples, 2) samples created using VC vs. TTS and 3) differences in the generation methods used. We found differences between spoofs and bona-fide. Spoofs having overall higher deflections in the waveform and overall smaller values in the spectral evaluation. In the spectral domain, several differences between VC and TTS were detected. XTTS and kNN-VC stood out when viewing the spectral features, e.g. spectral contrast. The samples created using RVC seemed to be the most similar to bona-fide. MFCC and LFCC were the most effective at identifying the differences between bona-fide and spoof audio, making them a suitable choice for detecting audio deepfakes.
Author(s)
Schäfer, Karla
Fraunhofer-Institut für Sichere Informationstechnologie SIT  
Mainwork
ICMR 2025, International Conference on Multimedia Retrieval. Proceedings  
Conference
International Conference on Multimedia Retrieval 2025  
Open Access
File(s)
Download (2.62 MB)
Rights
CC BY 4.0: Creative Commons Attribution
DOI
10.1145/3731715.3734425
10.24406/publica-5147
Additional link
Full text
Language
English
Fraunhofer-Institut für Sichere Informationstechnologie SIT  
Keyword(s)
  • Audio Deepfakes

  • Feature Analysis

  • Time Domain

  • Spectral Features

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024