Role of Prosodic Features on Children's Speech Recognition

2018 
In this paper, we have explored the role of combining prosodic variables with the existing acoustic features in the context of children's speech recognition using acoustic models trained on adults' speech. The explored acoustic features are Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction cepstral coefficients (PLPCC) while the considered prosodic variables are loudness, voice-intensity and voice-probability. An analysis presented in this paper shows that, given that the textual content remains the same, the considered prosodic variables exhibit very similar contours for adults' and children's speech. At the same time, the contours differ a lot when the context is different. Consequently, inclusion of prosodic information reduces the inter-speaker differences and increases the class discrimination. This subsequently improves the recognition performance. Further improvements are obtained by projecting the feature vectors obtained by combining the two features to a lower-dimensional subspace. The same has been experimentally verified in this study for mismatched speech recognition using deep neural network (DNN) based system. On combining MFCC (PLPCC) and prosodic features, a relative improvement of 16% (14%) is noted on decoding children's speech using adult data trained DNN models.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    7
    Citations
    NaN
    KQI
    []