An investigation into instantaneous frequency estimation methods for improved speech recognition features

2017 
There have been several studies, in the recent past, pointing to the importance of analytic phase of the speech signal in human perception, especially in noisy conditions. However, phase information is still not used in state-of-the-art speech recognition systems. In this paper, we illustrate the importance of analytic phase of the speech signal for automatic speech recognition. As the computation of analytic phase suffers from inevitable phase wrapping problem, we extract features from its time derivative, referred to as instantaneous frequency (IF). In this work, we highlight the issues involved in IF extraction from speech-like signals, and propose suitable modifications for IF extraction from speech signals. We used the deep neural network (DNN) framework to build a speech recognition system using features extracted from the IF of speech signals. The speech recognition system based on IF features delivered a phoneme error rate of 21.8% on TIMIT database, while the baseline system based on mel-frequency cepstral coefficients (MFCCs) delivered a phoneme error rate of 18.4%. The combination of IF and MFCC features based systems, using minimum Bayes risk (MBR) decoding, provided a relative improvement of 8.7% over the baseline system, illustrating the significance of analytic phase for speech recognition.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    6
    Citations
    NaN
    KQI
    []