Single-Channel Speech Enhancement with Sequentially Trained DNN System

2019 
One of the recent methods for speech enhancement is to find the mapping function between noisy speech mixture and the clean speech signals with a trained deep neural network(DNN) model, especially in the monaural case. Such a model, however, is often over-fit with the training data, and limited when dealing with noise and interferences that are unseen in the training process. To address this issue, we propose an enhancement system with two sequentially trained DNNs, in order to improve the generalization ability of the model. Two DNNs are trained sequentially using different training targets, with one applied to remove the noise interference and the other used to further improve the quality with time-frequency (T-F) mask. The TIMIT corpus, non-speech noise and NOISEX datasets are used to generate the training and testing data. Evaluations using perceptual evaluation of speech quality (PESQ), the short-time objective intelligibility (STOI) and signal to distortion ratio (SDR) show the improved performance of the proposed method over the state-of-the-art method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []