Exploring the Role of Speaking-Rate Adaptation on Children's Speech Recognition

2018 
Earlier studies have shown that, both intra-speaker and inter-speaker variabilities affect the recognition performance of any automatic speech recognition (ASR) system. The differences in speaking-rate among the speakers is one such factor affecting the system performance. An extreme example where speaking-rate variations become detrimental is the task of recognizing children's speech on an ASR system trained using adults' speech. In the context of such mismatched ASR tasks, only a few works on speaking-rate adaptation (SRA) through time-scale modification (TSM) have been reported. In addition to that, effect of TSM was explored on ASR systems developed using Gaussian mixture models (GMM). Motivated by these facts, the role of SRA is studied in this work in the context of ASR systems employing deep neural networks (DNN) for statistical modeling. Further, adaptation is done by changing the frame-length and overlap during front-end speech parameterization phase. SRA leads to significant reductions in errors as demonstrated by the experimental evaluations reported in this work. The effect of combining SRA with explicit pitch modification is also studied in this paper. Pitch modification is reported to be very effective in the case of children's mismatched ASR. Combining the two techniques results in additive reductions in errors.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []