• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. On the Relation Between Speech Quality and Quantized Latent Representations of Neural Codecs
 
  • Details
  • Full
Options
2025
Conference Paper
Title

On the Relation Between Speech Quality and Quantized Latent Representations of Neural Codecs

Abstract
Neural audio signal codecs have attracted significant attention in recent years. In essence, the impressive low bitrate achieved by such encoders is enabled by learning an abstract representation that captures the properties of encoded signals, e.g., speech. In this work, we investigate the relation between the latent representation of the input signal learned by a neural codec and the quality of speech signals. To do so, we introduce Latent-representation-to-Quantization error Ratio (LQR) measures, which quantify the distance from the idealized neural codec’s speech signal model for a given speech signal. We compare the proposed metrics to intrusive measures as well as data-driven supervised methods using two subjective speech quality datasets. This analysis shows that the proposed LQR correlates strongly (up to 0.9 Pearson’s correlation) with the subjective quality of speech. Despite being a non-intrusive metric, this yields a competitive performance with, or even better than, other pre-trained and intrusive measures. These results show that LQR is a promising basis for more sophisticated speech quality measures.
Author(s)
Halimeh, Mhd Modar
Fraunhofer-Institut für Integrierte Schaltungen IIS  
Torcoli, Matteo  
Fraunhofer-Institut für Integrierte Schaltungen IIS  
Grundhuber, Philipp
Fraunhofer-Institut für Integrierte Schaltungen IIS  
Habets, Emanuel  
Fraunhofer-Institut für Integrierte Schaltungen IIS  
Mainwork
IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2025. Proceedings  
Conference
International Conference on Acoustics, Speech and Signal Processing 2025  
DOI
10.1109/ICASSP49660.2025.10890357
Language
English
Fraunhofer-Institut für Integrierte Schaltungen IIS  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024