• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Artikel
  4. Speech-dependent data augmentation for own voice reconstruction with hearable microphones in noisy environments
 
  • Details
  • Full
Options
July 31, 2025
Journal Article
Title

Speech-dependent data augmentation for own voice reconstruction with hearable microphones in noisy environments

Abstract
Hearable devices, equipped with one or more microphones, can be used to capture the user’s own voice in noisy environments. In such environments, an own voice reconstruction (OVR) system is needed to enhance the quality and intelligibility of the recorded own voice. In this work, we aim to estimate clean broadband speech from a microphone at the outer face of the hearable and an in-ear microphone, which captures the own voice at a higher signal-to-noise ratio than the outer microphone, but with a limited bandwidth and additive body-produced noise. Training a supervised deep learning-based OVR system requires a substantial amount of own voice signals as training data. Such training data can be collected by recording many utterances from different talkers wearing the hearable, which is costly, or generated by augmenting existing clean speech datasets. In this paper, we investigate several data augmentation techniques to simulate a large amount of in-ear own voice signals from a limited amount of recorded own voice signals. More specifically, we consider different models for the own voice transfer characteristics between the outer microphone and the in-ear microphone, ranging from a fixed talker-averaged relative transfer function to a phoneme-dependent individual model. We investigate the influence of the amount of recorded own voice signals on the performance of an OVR system based on the FT-JNF architecture, either by directly using the recorded signals for training or by using the recorded signals to generate augmented data for training (with and without fine-tuning with recorded signals). Experimental results show that training using the proposed speech-dependent individual data augmentation technique and additional fine-tuning with recorded signals yields the best performance in terms of objective metrics, even when only few recorded own voice signals are available.
Author(s)
Ohlenbusch, Mattes  
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Rollwage, Christian  
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Doclo, Simon  
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Journal
EURASIP Journal on audio, speech, and music processing : EURASIP JASMP  
Open Access
File(s)
Download (1.87 MB)
Rights
CC BY 4.0: Creative Commons Attribution
DOI
10.1186/s13636-025-00418-1
10.24406/publica-7485
Additional link
Full text
Language
English
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Keyword(s)
  • Data augmentation

  • Hearables

  • Multi-microphone speech enhancement

  • Own voice reconstruction

  • Voice pickup

  • Individual models

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024