Research On Synthesis Of Speech Parameter And Emotional Speech For Malay Language Using LSTM RNN

2018 
As the style of language expression become liberalized and diversified increasingly, the advantages of using deep learning models in the field of speech synthesis are gradually highlighted. However, most of the current studies are based on those popular languages such as Chinese and English, and there is a little research on minority languages. To this end, the speech parameter generation and emotional speech synthesis for Malay are studied in this paper. We first used recurrent neural network (RNN) to capture the features of dependencies in Malay, and the parametric model was established through multivariate feature matrices for Malay texts using long short-term (LSTM). Most of the inputs are audio and corresponding triphone models which are obtained after a series of segmentation in the process of speech synthesis. There are few emotional components remained in the segmented results. This paper used LSTM RNN to directly model on the waveform of Malay speech and to keep emotions as much as possible. Experimental results on real-life data showed that the synthesis of Malay speech parameter based on LSTM RNN model achieved satisfying performance which are 1.16 and 0.25 improvements in two indexes respectively and applying that model in Malay emotional speech synthesis reached the precision of 85.46%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    0
    Citations
    NaN
    KQI
    []