• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. Speech Separation for an Unknown Number of Speakers Using Transformers with Encoder-Decoder Attractors
 
  • Details
  • Full
Options
2022
Conference Paper
Title

Speech Separation for an Unknown Number of Speakers Using Transformers with Encoder-Decoder Attractors

Abstract
Speaker-independent speech separation for single-channel mixtures with an unknown number of multiple speakers in the waveform domain is considered in this paper. To deal with the unknown number of sources, we incorporate an encoder-decoder attractor (EDA) module into a speech separation network. The neural network architecture consists of a trainable encoder-decoder pair and a masking network. The mask network in the proposed approach is inspired by the transformer-based SepFormer separation system. It contains a dual-path block and a triple path block, each block modeling both short-time and long-time dependencies in the signal. The EDA module first summarises the dual-path block output using an LSTM encoder and generates one attractor vector per speaker in the mixture using an LSTM decoder. The attractors are combined with the dual-path block output to generate speaker channels, which are processed jointly by the triple-path block to predict the mask. Further, a linear-sigmoid layer, with attractors as the input, predicts a binary output to indicate a stopping criterion for attractor generation. The proposed approach is evaluated on the WSJ0-mix dataset with mixtures of up to five speakers. State-of-the-art results are obtained in the speech separation quality and speaker counting for all the mixtures.
Author(s)
Chetupalli, Srikanth Raj
Fraunhofer-Institut für Integrierte Schaltungen IIS  
Habets, Emanuël Anco Peter
Mainwork
Interspeech 2022  
Conference
International Speech Communication Association (INTERSPEECH Annual Conference) 2022  
DOI
10.21437/Interspeech.2022-10849
Language
English
Fraunhofer-Institut für Integrierte Schaltungen IIS  
Keyword(s)
  • attractors

  • source separation

  • speaker counting

  • transformers

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024