Alternative networks for monolingual bottleneck features

2017 
While recent advances in deep neural networks have lead to significant improvements in speech recognition, they have been applied mainly to acoustic and language modeling. We instead apply the models to bottleneck feature extraction. Several DNN, CNN, and BLSTM-based bottleneck feature networks are compared using both DNN and BLSTM acoustic models. Multiple variations in network architecture and feature input are explored. Results are reported on four languages from the IARPA Babel program. The shallow CNN and BLSTM both improve performance by a similar amount. The best network is a deep CNN and improves WER by 1.4% and ATWV by 2% absolute compared to the baseline DNN network when using a DNN acoustic model. Relative gains hold when using stronger BLSTM acoustic models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    8
    Citations
    NaN
    KQI
    []