A Convolutional Network With Multi-Scale and Attention Mechanisms for End-to-End Single-Channel Speech Enhancement

Xiaoxiao Xiang,Xiaojuan Zhang,Haozhe Chen

A Convolutional Network With Multi-Scale and Attention Mechanisms for End-to-End Single-Channel Speech Enhancement

2021

One of the leading speech enhancement technologies is the deep neural network-based approach, which dominates the recent development in single-channel speech enhancement. In this paper, we propose a convolutional network with multi-scale and attention mechanisms for end-to-end single-channel speech enhancement (MASENet). More specifically, the MASENet network consists of five modules, namely multi-scale speech encoder, frequency-dilated module, temporal convolutional attention module, post-processing module, and single-scale speech decoder. The frequency-dilated module and temporal convolutional attention module are leveraged to extract local and global information. The dense connections are used to avoid the vanishing gradient problem. Furthermore, we design the attention block to improve the discriminative learning ability of the network. The experimental results show that the proposed network achieves significantly better enhancement performance than other baselines in terms of objective speech intelligibility and quality metrics.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations