Data-driven pause prediction for synthesis of storytelling style speech based on discourse modes

2015 
In storytelling style, a storyteller generally uses prosodic variations with subtle speech nuances for the better apprehension of the listeners. It is achieved by emphasizing prominent words, using various emotions, mimicking voices and providing appropriate pauses. This work is a part of building the Story Text-to-Speech (TTS) [1] synthesis systems in Indian Languages, which aims at synthesizing the storytelling style speech from the neutral TTS. The neutral speech is converted to storytelling style by modifying the specific prosodic parameters (i.e. duration, pitch, tempo, intensity and pauses). The main contribution of this paper is to model the pause patterns present in storytelling style speech based on the modes of discourse: narrative, descriptive and dialogue to capture the story-semantic information. Analysis of pause patterns are carried out for children stories in Hindi language. We analyzed the pause patterns and classified pauses into three different categories: short, medium and long pauses for each mode of discourse. A three stage data-driven method is proposed to predict the position and duration of the pauses. We conducted objective test to evaluate the performance of the proposed method at each stage. Also, subjective evaluation is carried out on the final output of the Hindi Story-TTS system. The subjective evaluation connotes that the subjects have perceived an improvement in speech quality in terms of storytelling style.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    0
    Citations
    NaN
    KQI
    []