Slovenian spontaneous speech recognition and acoustic modeling of filled pauses and onomatopoeas

2008 
This paper is focused on acoustic modeling for spontaneous speech recognition. This topic is still a very challenging task for speech technology research community. The attributes of spontaneous speech can heavily degrade speech recognizer's accuracy and performance. Filled pauses and onomatopoeias present one of such important attributes of spontaneous speech, which can give considerably worse accuracy. Although filled pauses don't carry any semantic information, they are still very important from the modeling perspective. A novel acoustic modeling approach is proposed in this paper, where the filled pauses are modeled using the phonetic broad classes, which corresponds with their acoustic-phonetic properties. The phonetic broad classes are language dependent, and can be defined by an expert or in a data-driven way. The new filled pauses modeling approach is compared with three other implicit filled pauses modeling methods. All experiments were carried out using a context-dependent Hidden Markov Models based speech recognition system. For training and evaluation, the Slovenian BNSI Broadcast News speech and text database was applied. The database contains manually transcribed recordings of TV news shows. The evaluation of the proposed acoustic modeling approach was done on a set of spontaneous speech. The overall best filled pauses acoustic modeling approach improved the speech recognizer's word accuracy for 5.70% relatively in comparison to the baseline system, without influencing the recognition time.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    7
    Citations
    NaN
    KQI
    []