Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis

Rui Liu,Berrak Sisman,Feilong Bao,Jichen Yang,Guanglai Gao,Haizhou Li

Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis

2021

Prosodic phrasing is an important factor that affects naturalness and intelligibility in text-to-speech synthesis. Studies show that deep learning techniques improve prosodic phrasing when large text and speech corpus are available. However, for low-resource languages, such as Mongolian, prosodic phrasing remains a challenge for various reasons. First, the database suitable for system training is limited; Second, word composition knowledge that is prosody-informing has not been used in prosodic phrase modeling. To address these problems, in this paper, we propose a feature augmentation method in conjunction with a self-attention neural classifier. We augment input text with morphological and phonological decompositions of words to enhance the text encoder. We study the use of self-attention classifier, that makes use of global context of a sentence, as a decoder for phrase break prediction. Both objective and subjective evaluations validate the effectiveness of the proposed phrase break prediction framework, that consistently improves voice quality in a Mongolian text-to-speech synthesis system.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations