A Convolutional Network With Multi-Scale and Attention Mechanisms for End-to-End Single-Channel Speech Enhancement

2021 
One of the leading speech enhancement technologies is the deep neural network-based approach, which dominates the recent development in single-channel speech enhancement. In this paper, we propose a convolutional network with multi-scale and attention mechanisms for end-to-end single-channel speech enhancement (MASENet). More specifically, the MASENet network consists of five modules, namely multi-scale speech encoder, frequency-dilated module, temporal convolutional attention module, post-processing module, and single-scale speech decoder. The frequency-dilated module and temporal convolutional attention module are leveraged to extract local and global information. The dense connections are used to avoid the vanishing gradient problem. Furthermore, we design the attention block to improve the discriminative learning ability of the network. The experimental results show that the proposed network achieves significantly better enhancement performance than other baselines in terms of objective speech intelligibility and quality metrics.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    1
    Citations
    NaN
    KQI
    []