Time-Domain Neural Network Approach for Speech Bandwidth Extension

Xiang Hao,Chenglin Xu,Nana Hou,Lei Xie,Eng Siong Chng,Haizhou Li

Time-Domain Neural Network Approach for Speech Bandwidth Extension

2020

In this paper, we study the time-domain neural network approach for speech bandwidth extension. We propose a network architecture, named multi-scale fusion neural network (MfNet), that gradually restores the low-frequency signal and predicts the high-frequency signal through the exchange of information across different scale representations. We propose a training scheme to optimize the network with a combination of perceptual loss and time-domain adversarial loss. Experiments show the proposed multi-scale fusion network consistently outperforms the competing methods in terms of perceptual evaluation of speech quality (PESQ), signal to distortion rate (SDR), signal to noise ratio (SNR), log-spectral distance (LSD) and word error rate (WER). More promisingly, the multi-scale fusion network requires only 10% of the parameters of the time-domain reference baseline.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations