Two Heads Are Better Than One: A Two-Stage Complex Spectral Mapping Approach for Monaural Speech Enhancement

2021 
For challenging acoustic scenarios as low signal-to-noise ratios, current speech enhancement systems usually suffer from performance bottleneck in extracting the target speech from the mixtures within one step. To address this issue, we propose a novel complex spectral mapping approach with a two-stage pipeline for monaural speech enhancement in the time-frequency domain. The proposed algorithm aims to decouple the primal problem into multiple sub-problems, which follows the classic proverb, “two heads are better than one”. More specifically, in the first stage, only magnitude is estimated, which is incorporated with the noisy phase to obtain a coarse complex spectrum estimation. To facilitate the previous estimation, in the second stage, an auxiliary network serves as the post-processing module, where residual noise is further suppressed and the phase information is effectively modified. The global residual connection strategy is adopted in the second stage to accelerate the training convergence speed. To alleviate the parameter burden caused by the multi-stage pipeline, we propose a light-weight temporal convolutional module, which substantially decreases the trainable parameters and obtains even better objective performance over the original version. We conduct extensive experiments on three standard corpora, including WSJ0-SI84, DNS Challenge dataset, and Voice Bank + DEMAND dataset. Objective test results demonstrate that our proposed approach achieves state-of-the-art performance over previous advanced systems under various conditions. Meanwhile, subjective listening test results further validate the superiority of our proposed method in terms of subjective quality.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    70
    References
    11
    Citations
    NaN
    KQI
    []