Dictation of Japanese Speech Based on Kana and Kanji Character String

2009 
In this paper, character-based Japanese dictation method is proposed. This method is based on the kana and kanji string language model proposed by Ito et al. First, sentences in the training corpus are split into character-based units (CBUs). Then strings of CBUs (CBUSes) are chosen from the CBU corpus based on a statistical criterion. We examined three criteria for the CBUS selection. They are the frequency-based selection, the mutual-information based selection and their combination. From the experimental results, it was found that the combined method gave the best result (7.19% and 8.75% CBU error rates for the 20k and the 60k word vocabulary conditions, respectively) which was better than the ordinary word-based method (7.61% and 9.15% CBU error rates for the 20k and the 60k word vocabulary conditions, respectively). In addition, we carried out a recognition experiment for the Corpus of Spontaneous Japanese to confirm that the proposed method is effective for not only the read speech but also for spontaneous speech. As a result, we obtained the best result (29.82%) using the frequency-based method, which is better than the word-based recognition result (32.80%).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []