State-dependent phoneme-based model merging for dialectal Chinese speech recognition

2008 
This paper discusses and evaluates a novel but simple and effective acoustic modeling method called ''state-dependent phoneme-based model merging (SDPBMM)'', used to build dialectal Chinese speech recognizer from a small amount of dialectal Chinese speech. In SDPBMM, state-level pronunciation modeling is done by merging a tied-state of standard triphones with a state of dialectal monophone(s). In state-level pronunciation modeling, which acts as the merging criterion for SDPBMM, sparseness arises due to limited data set. To overcome this problem, a distance-based pronunciation modeling approach is also proposed. With a 40-min Shanghai-dialectal Chinese speech data, SDPBMM achieves a significant absolute syllable error rate (SER) reduction of approximately 7.1% (and a relative SER reduction of 14.3%) for Shanghai-dialectal Chinese, without performance degradation for standard Chinese. It is experimentally shown that SDPBMM outperforms Maximum Likelihood Linear Regression (MLLR) adaptation and the Pooled Retraining methods by 1.4% and 5.3%, respectively, in terms of SER reduction. Also, when combined with MLLR adaptation, an absolute SER reduction of 1.4% can further be achieved by SDPBMM.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    4
    Citations
    NaN
    KQI
    []