• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Scopus
  4. Multi-Speaker Text-to-Speech Using ForwardTacotron with Improved Duration Prediction
 
  • Details
  • Full
Options
2023
Conference Paper
Title

Multi-Speaker Text-to-Speech Using ForwardTacotron with Improved Duration Prediction

Abstract
Several non-autoregressive methods for fast and efficient text-to-speech synthesis have been proposed. Most of these use a duration predictor to estimate the temporal sequence of phonemes in the speech. This duration prediction is based on the input phoneme sequence in a speaker-independent fashion. The resulting constant speech pace across speakers is unnatural since every human has a unique characteristic speed in talking. This paper proposes an extension of the multi-speaker ForwardTacotron to learn this aspect with trainable speaker embeddings. The duration of synthesized speech from the proposed model across multiple speakers is much closer to the durations of speech synthesized by a baseline auto-regressive model. The proposed extension yields marginal improvements in intelligibility as measured through an automated semantically unpredictable sentence test. Further, we show that the speech rhythm does not play a significant part in the perceptual quality assessment through a listening test.
Author(s)
Kayyar Lakshminarayana, Kishor
Fraunhofer-Institut für Integrierte Schaltungen IIS  
Dittmar, Christian  
Fraunhofer-Institut für Integrierte Schaltungen IIS  
Pia, Nicola
Fraunhofer-Institut für Integrierte Schaltungen IIS  
Habets, Emanuel A.P.
Mainwork
Speech Communication. 15th ITG Conference 2023  
Conference
Conference on Speech Communication 2023  
DOI
10.30420/456164036
Language
English
Fraunhofer-Institut für Integrierte Schaltungen IIS  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024