Dnn-based Spectral Enhancement for Neural Waveform Generators with Low-bit Quantization

2019 
This paper presents a spectral enhancement method to improve the quality of speech reconstructed by neural waveform generators with low-bit quantization. At training stage, this method builds a multiple-target DNN, which predicts log amplitude spectra of natural high-bit waveforms together with the amplitude ratios between natural and distorted spectra. Log amplitude spectra of the waveforms reconstructed by low-bit neural waveform generators are adopted as model input. At generation stage, the enhanced amplitude spectra are obtained by an ensemble decoding strategy, and are further combined with the phase spectra of low-bit waveforms to produce the final waveforms by inverse STFT. In our experiments on WaveRNN vocoders, an 8-bit WaveRNN with spectral enhancement outperforms a 16-bit counterpart with the same model complexity in terms of the quality of reconstructed waveforms. Besides, the proposed spectral enhancement method can also help an 8-bit WaveRNN with reduced model complexity to achieve similar subjective performance with a conventional 16-bit WaveRNN.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    5
    Citations
    NaN
    KQI
    []