Nonstationary speech analysis using neural prediction

2000 
Discusses extracting dynamic features of individual speakers from short speech segments for robust recognition. As a method of homomorphic signal processing, cepstrum analysis can separate the excitation and impulse response of the vocal channels when applied to speech signals. For short-time spectrum analysis, overlapping windows are used to divided speech into many frames. For each window, one cepstrum vector is obtained. It is assumed that for each frame (about 30 msec), the speech signal is stationary. However, the speech is basically nonstationary for long time intervals. Therefore, one must consider the dynamic changes between frames. Conventional methods often use only the static features of the short-time cepstrum. A neural network can be seen as a nonlinear dynamic system, which may express both the static and dynamic features of the signal at hand. For this purpose, a neural prediction network was designed to extract the inter- and intraframe correlations of cepstrum vectors, so as to obtain the robust features of individual speakers from very short speech epochs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    5
    Citations
    NaN
    KQI
    []