Monaural Speech Enhancement Based On Two Stage Long Short-Term Memory Networks

Yang Xian,Yang Sun,Wenwu Wang,Syed Mohsen Naqvi

Monaural Speech Enhancement Based On Two Stage Long Short-Term Memory Networks

2019

Yang Xian
Yang Sun
Wenwu Wang
Syed Mohsen Naqvi

The performance of the deep neural networks (DNNs) based monaural speech enhancement methods is still limited in real room environments, particularly for the speaker-independent case. The surface reflections and unseen speakers increase the challenge in the estimation of sources from reverberant noisy speech mixtures. To address these issues, we propose a two-stage approach using long short-term memory (LSTM) networks. In the first stage, the dereverberation mask (DM) is obtained by using a trained LSTM, which aims to dereverberate the noisy speech mixture. In the second stage, the ideal ratio mask (IRM) is estimated by the second trained LSTM, which is exploited to separate the desired speech signal from the dereverberated speech mixture. The signal-to-distortion ratio (SDR) shows the efficacy of the LSTMs over DNNs.

Keywords:

Long short term memory
Monaural
Speech recognition
Speech enhancement
Computer science
ideal ratio mask
Spectrogram
deep neural networks
Noise measurement
Feature extraction

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations