Lecture speech recognition using discrete-mixture HMMs

2011 
Most of the state-of-the-art speech recognition systems use continuous-mixture hidden Markov models (CMHMMs) as acoustic models. On the other hand, it is well known that discrete hidden Markov model (DHMM) systems show poor performance because they are affected by quantization distortion. In this paper, we present an efficient acoustic modeling based on discrete distribution for large-vocabulary continuous speech recognition (LVCSR). In our previous work, we proposed the maximum a posteriori (MAP) estimation of discrete-mixture hidden Markov model (DMHMM) parameters and showed that the DMHMM system performed better in noisy conditions than the conventional CMHMM system. However, we conducted the recognition experiments on a read/speech task in which the vocabulary size was only 5k. In addition, the DMHMM was not effective in clean condition in that work. In this paper, we have developed a DMHMM-based LVCSR system and evaluated the system on a more difficult task under clean condition. In Japan, a large-scale spontaneous speech database ‘Corpus of Spontaneous Japanese’ has been used as the common evaluation database for spontaneous speech and we used it for our experiments. From the results, it was seen that the DMHMM system showed almost the same performance as the CMHMM system. Moreover, performance improvement could be achieved by a histogram equalization method. Copyright © 2010 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    0
    Citations
    NaN
    KQI
    []