• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. Discrete Audio Representations from SoundStream: A Dual Approach to Efficient Transmission and Speech Detection
 
  • Details
  • Full
Options
2025
Conference Paper
Title

Discrete Audio Representations from SoundStream: A Dual Approach to Efficient Transmission and Speech Detection

Abstract
This paper investigates the application of Sound-Stream, a state-of-the-art neural audio codec, to achieve efficient audio transmission and effective speech detection in resource-constrained environments. We analyze SoundStream's architecture, emphasizing its innovative use of Residual Vector Quantization (RVQ) to create compact, discrete audio representations while preserving essential audio features. Additionally, we introduce a novel voice activity detection (VAD) algorithm designed to identify relevant speech segments within the transmitted audio. Our evaluation employs objective metrics, including Deep Noise Suppression Mean Opinion Score (DNSMOS), Non-Intrusive Speech Quality Assessment (NISQA), Short-Time Objective Intelligibility (STOI), and the density distribution of speech over codebooks. Furthermore, we assess the performance of our VAD using the F-measure. The results demonstrate SoundStream's capability to maintain high audio fidelity and intelligibility despite varying encoding stages, while the VAD algorithm effectively ensures speech detection. This study highlights the potential of these methodologies to enhance audio processing in diverse applications, particularly in scenarios where bandwidth and clarity are critical. This paper was originally presented at the NATO Science and Technology Organization Symposium (ICMCIS) organized by the Information Systems Technology (IST) Scientific and Technical Committee, IST-209-RSY - the ICMCIS, held in Oeiras, Portugal, 13-14 May 2025'.
Author(s)
Gökgöz, Fahrettin  
Fraunhofer-Institut für Kommunikation, Informationsverarbeitung und Ergonomie FKIE  
Ali, Hisham
Fraunhofer-Institut für Kommunikation, Informationsverarbeitung und Ergonomie FKIE  
Pal, Priya
Fraunhofer-Institut für Kommunikation, Informationsverarbeitung und Ergonomie FKIE  
Mainwork
International Conference on Military Communications and Information Systems, ICMCIS 2025  
Conference
International Conference on Military Communications and Information Systems 2025  
DOI
10.1109/ICMCIS64378.2025.11047823
Language
English
Fraunhofer-Institut für Kommunikation, Informationsverarbeitung und Ergonomie FKIE  
Keyword(s)
  • audio quality assessment

  • neural audio codec

  • residual vector quantization

  • voice activity detection

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024