Personalized natural speech synthesis based on retrieval of pitch patterns using hierarchical Fujisaki model

2013 
In recent years, speech synthesis based on Hidden Markov Model (HMM) has been developed, which can synthesize stable and intelligible speech with flexibility and small footprint. However, synthesized prosodic features are still incapable to convey personalization and natural property. Previous prosody models, mainly constructed from the clustered prosodic features, are unable to characterize personalized prosodic information as the linguistic cues of the input sentence are indistinguishable for all speakers. An approach to retrieval of personalized pitch patterns from the real speech corpus of the target speaker, is proposed, incorporating with the HMM-based speech synthesizer, to generate a personalized natural pitch contour. The modified Fujisaki model is adopted to depict the hierarchical pitch patterns, aiming to model local pitch contour variation and global intonation of utterances in the corpus. The codeword sequences of utterances in the training and the synthesized corpora are constructed and used to obtain the relationship of pitch patterns between the real and synthesized speech. Finally, a language model of pitch pattern is constructed to obtain an optimal pitch pattern sequence of the input sentence. The experimental results using subjective and objective evaluations demonstrated the proposed approach can substantially outperform the conventional statistical synthesis methods, in terms of naturalness and speaker similarity.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    5
    Citations
    NaN
    KQI
    []