• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. Real-Time Single-Channel Speaker-Conditioned Target Speaker Extraction Using TCN-Conformer with Efficient Self-Attention Mechanisms
 
  • Details
  • Full
Options
2025
Conference Paper
Title

Real-Time Single-Channel Speaker-Conditioned Target Speaker Extraction Using TCN-Conformer with Efficient Self-Attention Mechanisms

Abstract
Speaker-conditioned target speaker extraction systems aim to extract the target speaker from a mixture of speakers by utilizing auxiliary information about the target speaker. Typically, such systems consist of a speaker embedder network and a speaker separator network. While self-attention mechanisms have demonstrated remarkable performance in speech processing tasks, including target speaker extraction, their high memory usage and computational complexity pose challenges for real-time applications. To address these limitations, we integrate a linear self-attention mechanism into the separator network, significantly reducing memory and computational costs, and thereby making the system more suitable for real-time applications. Furthermore, we evaluate the performance of this linear self-attention-based speaker extraction system against a system using memory-efficient self-attention. Experimental results on two-speaker, three-speaker, and noisy two-speaker mixtures show that linear self-attention not only improves speaker extraction performance compared to both traditional and memory-efficient self-attention but also significantly reduces the real-time factor and computational cost.
Author(s)
Sinha, Ragini  
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Rollwage, Christian  
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Doclo, Simon  
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Mainwork
33rd European Signal Processing Conference (EUSIPCO) 2025. Proceedings  
Conference
European Signal Processing Conference 2025  
DOI
10.23919/EUSIPCO63237.2025.11226272
Language
English
Fraunhofer-Institut für Digitale Medientechnologie IDMT  
Keyword(s)
  • conformer

  • efficient self-attention

  • real-time

  • Target speaker extraction

  • TCN

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024