Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS.

Rui Liu,Berrak Sisman,Feilong Bao,Guanglai Gao,Haizhou Li

Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS.

2020

Rui Liu
Berrak Sisman
Feilong Bao
Guanglai Gao
Haizhou Li

Tacotron-based end-to-end speech synthesis has shown remarkable voice quality. However, the rendering of prosody in the synthesized speech remains to be improved, especially for long sentences, where prosodic phrasing errors can occur frequently. In this paper, we extend the Tacotron-based speech synthesis framework to explicitly model the prosodic phrase breaks. We propose a multi-task learning scheme for Tacotron training, that optimizes the system to predict both Mel spectrum and phrase breaks. To our best knowledge, this is the first implementation of multi-task learning for Tacotron based TTS with a prosodic phrasing model. Experiments show that our proposed training scheme consistently improves the voice quality for both Chinese and Mongolian systems.

Keywords:

Phrase
Multi-task learning
Computer science
Prosody
Speech recognition
Speech synthesis
Rendering (computer graphics)

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations