Modeling filled pauses for spontaneous speech recognition applications

2008 
This paper is focused on acoustic modeling for spontaneous speech recognition applications. This topic is still a very challenging task for speech technology research community. The attributes of spontaneous speech can heavily degrade speech recognizer's accuracy. Filled pauses and onomatopoeias present one of such important attributes. A novel acoustic modeling approach is proposed in this paper, where the filled pauses are modeled using the phonetic broad classes, which corresponds with their acoustic-phonetic properties. The new modeling approach is compared with three other filled pauses modeling methods. All experiments were carried out using a context-dependent Hidden Markov Models based speech recognition system. For training and evaluation, the Slovenian BNSI Broadcast News speech and text database was applied. The database contains manually transcribed recording of TV news shows. The evaluation of the proposed acoustic modeling approach was done with a set of spontaneous speech. The overall best acoustic modeling of filled pauses improved the speech recognizer's word accuracy for 5.70% relatively in comparison to the baseline system.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    1
    Citations
    NaN
    KQI
    []