Data-driven pause prediction for speech synthesis in storytelling style speech

2015 
In the storyteller speech, pauses plays a significant role in introducing suspense and climax. Pauses are used to emphasize keywords, emotion-salient words and separate the phrases in the utterance. The objective of this work is to predict the position and duration of the pauses in the synthesized speech from the text-to-speech system. We analyzed the pause patterns in storyteller speech and classified the pauses into three different categories, that is, short, medium and long pauses. A data driven three stage pause prediction model is proposed. In the first stage, the model is built properly to identify the pause position within an utterance using a set of word-level features. In the second stage, the pauses are classified into three different categories using a set of syllable-level features. In the final stage, a regression predictor is trained to predict the pause duration for each category. We conducted both objective and subjective tests to evaluate the proposed method. The subjective evaluation showed that subjects are perceiving a noticeable difference in the synthesized speech using the proposed method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    8
    Citations
    NaN
    KQI
    []