Time-Domain Neural Network Approach for Speech Bandwidth Extension

2020 
In this paper, we study the time-domain neural network approach for speech bandwidth extension. We propose a network architecture, named multi-scale fusion neural network (MfNet), that gradually restores the low-frequency signal and predicts the high-frequency signal through the exchange of information across different scale representations. We propose a training scheme to optimize the network with a combination of perceptual loss and time-domain adversarial loss. Experiments show the proposed multi-scale fusion network consistently outperforms the competing methods in terms of perceptual evaluation of speech quality (PESQ), signal to distortion rate (SDR), signal to noise ratio (SNR), log-spectral distance (LSD) and word error rate (WER). More promisingly, the multi-scale fusion network requires only 10% of the parameters of the time-domain reference baseline.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    38
    References
    4
    Citations
    NaN
    KQI
    []