Cochlea-inspired speech recognition interface

2019 
Automatic speech recognition (ASR) technology provides a natural interface for human-machine interaction. Typical ASR systems can achieve high performance in quiet environments but, unlike humans, perform poorly in real-world situations. To better simulate the human auditory periphery and improve the performance in realistic noisy scenarios, we propose two models of speech recognition front-ends based on a biophysical cochlear model. The first front-end is based on the method of signal reconstruction from a basilar membrane response. When applied to noisy speech, this method results in improved signal quality. This method can be used as a preprocessing step in a standard ASR system and can also be used as a noise reduction technique for other applications. The second front-end we propose is based on the construction of speech recognition coefficients directly from a basilar membrane response. Experimental results using a continuous-density hidden Markov model (HMM) recognizer demonstrate significant improvement in performance compared to standard Mel-frequency cepstral coefficients (MFCC) in various types of noisy conditions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    0
    Citations
    NaN
    KQI
    []