A word-level token-passing decoder for subword n-gram LVCSR

2014 
The decoder is a key component of any modern speech recognizer. Morphologically rich languages pose special challenges for the decoder design, as a very large recognition vocabulary is required to avoid high out-of-vocabulary (OOV) rates. To alleviate these issues, the n-gram models are often trained over subwords instead of words. A subword n-gram model is able to assign probabilities to unseen word forms. We review token-passing decoding and suggest a novel way of creating the decoding graph for subword n-grams on word-level. This approach has the advantage of a better control over the recognition vocabulary, including removal of nonsense words and the possibility to include important OOV-words to the graph. The different decoders are evaluated in a Finnish large vocabulary continuous speech recognition (LVCSR) task.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    6
    Citations
    NaN
    KQI
    []