Using Generalized Gaussian Distributions to Improve Regression Error Modeling for Deep Learning-Based Speech Enhancement

2019 
From a statistical perspective, the conventional minimum mean squared error (MMSE) criterion can be considered as the maximum likelihood (ML) solution under an assumed homoscedastic Gaussian error model. However, in this paper, a statistical analysis reveals the super-Gaussian and heteroscedastic properties of the prediction errors in nonlinear regression deep neural network (DNN)-based speech enhancement when estimating clean log-power spectral (LPS) components at DNN outputs with noisy LPS features in DNN input vectors. Accordingly, we propose treating all dimensions of the prediction error vector as statistically independent random variables and model them with generalized Gaussian distributions (GGDs). Then, the objective function with the GGD error model is derived according to the ML criterion. Experiments on the TIMIT corpus corrupted by simulated additive noises show consistent improvements of our proposed DNN framework over the conventional DNN framework in terms of various objective quality measures under 14 unseen noise types evaluated and at various signal-to-noise ratio levels. Furthermore, the ML optimization objective with GGD outperforms the conventional MMSE criterion, achieving improved generalization and robustness.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    73
    References
    13
    Citations
    NaN
    KQI
    []