Alternative networks for monolingual bottleneck features

William Hartmann,Roger Hsiao,Stavros Tsakalidis

Alternative networks for monolingual bottleneck features

2017

William Hartmann
Roger Hsiao
Stavros Tsakalidis

While recent advances in deep neural networks have lead to significant improvements in speech recognition, they have been applied mainly to acoustic and language modeling. We instead apply the models to bottleneck feature extraction. Several DNN, CNN, and BLSTM-based bottleneck feature networks are compared using both DNN and BLSTM acoustic models. Multiple variations in network architecture and feature input are explored. Results are reported on four languages from the IARPA Babel program. The shallow CNN and BLSTM both improve performance by a similar amount. The best network is a deep CNN and improves WER by 1.4% and ATWV by 2% absolute compared to the baseline DNN network when using a DNN acoustic model. Relative gains hold when using stronger BLSTM acoustic models.

Keywords:

Artificial neural network
Speech recognition
Network architecture
Acoustic model
Computer science
Language model
Machine learning
MULTIPLE VARIATIONS
Feature extraction
Bottleneck
Artificial intelligence
Pattern recognition
deep neural networks

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations