Two-Stage Monaural Source Separation in Reverberant Room Environments using Deep Neural Networks
2019
Deep neural networks (DNNs) have been used for
dereverberation and separation in the monaural source separation
problem. However, the performance of current state-ofthe-art
methods is limited, particularly when applied in highly
reverberant room environments. In this paper, we propose a twostage
approach with two DNN-based methods to address this
problem. In the first stage, the dereverberation of the speech
mixture is achieved with the proposed dereverberation mask
(DM). In the second stage, the dereverberant speech mixture
is separated with the ideal ratio mask (IRM). To realize this
two-stage approach, in the first DNN-based method, the DM is
integrated with the IRM to generate the enhanced time-frequency
(T-F) mask, namely the ideal enhanced mask (IEM), as the
training target for the single DNN. In the second DNN-based
method, the DM and the IRM are predicted with two individual
DNNs. The IEEE and the TIMIT corpora with real room impulse
responses (RIRs) and noise from the NOISEX dataset are used to
generate speech mixtures for evaluations. The proposed methods
outperform the state-of-the-art specifically in highly reverberant
room environments.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
27
References
16
Citations
NaN
KQI