• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension
 
  • Details
  • Full
Options
2025
Conference Paper
Title

UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension

Abstract
In practical application of speech codecs, a multitude of factors such as the quality of the radio connection, limiting hardware or required user experience necessitate trade-offs between achievable perceptual quality, engendered bitrate and computational complexity. Most conventional and neural speech codecs operate on wideband (WB) speech signals to achieve this compromise. To further enhance the perceptual quality of coded speech, bandwidth extension (BWE) of the transmitted speech is an attractive and popular technique in conventional speech coding. In contrast, neural speech codecs are typically trained end-to-end to a specific set of requirements and are often not easily adaptable. In particular, they are typically trained to operate at a single fixed sampling rate. With the Universal Bandwidth Extension Generative Adversarial Network (UBGAN), we propose a modular and lightweight GAN-based solution that increases the operational flexibility of a wide range of conventional and neural codecs. Our model operates in the subband domain and extends the bandwidth of WB signals from 8 kHz to 16 kHz, resulting in super-wideband (SWB) signals. We further introduce two variants, guided-UBGAN and blind-UBGAN, where the guided version transmits quantized learned representation as a side information at a very low bitrate additional to the bitrate of the codec, while blind-BWE operates without such side-information. Our subjective assessments demonstrate the advantage of UBGAN applied to WB codecs and highlight the generalization capacity of our proposed method across multiple codecs and bitrates.
Author(s)
Gupta, Kishan
Fraunhofer-Institut für Integrierte Schaltungen IIS  
Korse, Srikanth
International Audio Laboratories Erlangen
Brendel, Andreas
Fraunhofer-Institut für Integrierte Schaltungen IIS  
Pia, Nicola
Fraunhofer-Institut für Integrierte Schaltungen IIS  
Fuchs, Guillaume  
Fraunhofer-Institut für Integrierte Schaltungen IIS  
Mainwork
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 2025. Proceedings  
Conference
Workshop on Applications of Signal Processing to Audio and Acoustics 2025  
Open Access
DOI
10.1109/WASPAA66052.2025.11230926
Additional link
Full text
Language
English
Fraunhofer-Institut für Integrierte Schaltungen IIS  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024