Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition.

Bhargav Pulugundla,Yang Gao,Brian King,Gokce Keskin,Harish Mallidi,Minhua Wu,Jasha Droppo,Roland Maas

Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition.

2021

Bhargav Pulugundla
Yang Gao
Brian King
Gokce Keskin
Harish Mallidi
Minhua Wu
Jasha Droppo
Roland Maas

Attention-based beamformers have recently been shown to be effective for multi-channel speech recognition. However, they are less capable at capturing local information. In this work, we propose a 2D Conv-Attention module which combines convolution neural networks with attention for beamforming. We apply self- and cross-attention to explicitly model the correlations within and between the input channels. The end-to-end 2D Conv-Attention model is compared with a multi-head self-attention and superdirective-based neural beamformers. We train and evaluate on an in-house multi-channel dataset. The results show a relative improvement of 3.8% in WER by the proposed model over the baseline neural beamformer.

Keywords:

Artificial neural network
Beamforming
Speech recognition
multi channel
Baseline (configuration management)
Convolution
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations