• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. Personalized Speech Synthesis for Zero-Shot Keyword Spotting
 
  • Details
  • Full
Options
2025
Conference Paper
Title

Personalized Speech Synthesis for Zero-Shot Keyword Spotting

Abstract
Usually, keyword spotting (KWS) systems can only detect the specific keywords they were trained to detect. Moreover, a sufficiently large number of spoken samples needs to be provided for each keyword, which may be impractical, particularly in dynamic applications. In this work, we present a methodology for generating synthetic speech samples to enhance KWS models, specifically to adapt to unseen words associated with known speakers. We first refine the SoundStream neural encoder to achieve high-quality encoding and decoding of the target speaker's voice. Subsequently, we adapt the SpearTTS model to create phonetically diverse sentences through a use-case generator module. The generated sentences are then strongly labeled to capture individual words. In experiments, we trained a template-based KWS model using this synthetic dataset and evaluated its performance against a set of real-world data. Our findings demonstrate the efficacy of synthetic data in improving KWS adaptability to new vocabularies.
Author(s)
Gökgöz, Fahrettin  
Fraunhofer-Institut für Kommunikation, Informationsverarbeitung und Ergonomie FKIE  
Cornaggia-Urrigshardt, Alessia  
Fraunhofer-Institut für Kommunikation, Informationsverarbeitung und Ergonomie FKIE  
Wilkinghoff, Kevin
Aalborg University
Mainwork
Speech Communication. 16th ITG Conference 2025  
Conference
Conference on Speech Communication 2025  
Language
English
Fraunhofer-Institut für Kommunikation, Informationsverarbeitung und Ergonomie FKIE  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024