Advancing speech activity detection for automatic speech assessment of pre-school children prompted speech using COMBO-SAD
Speech sound disorder (SSD), a developmental disorder which affects children's ability to produce the sounds (and words) within their native language, has a prevalence rate of 3%–16% among children in USA. Screening for SSDs generally requires recording, evaluation, and decision-making by a certified speech-language pathologist (SLP). Automating part or all of this process could significantly reduce the amount of time and effort in the screening process. However, in order for this process and especially the final “pass”/“fail” screening decision to be automated, children's speech content must be extracted from within a collected audio sample and therefore requires speech/silence activity detection. For this study, an iOS application for field use was developed to collect speech word productions from children, with algorithmic processing of all participants assigned a Percentage of Consonants Correct-Revised (PCC-R) score by a certified SLP. An unsupervised speech-activity-detection (SAD) algorithm is explored. COMBO-SAD, originally developed during the DARPA-RATS program, was modified to for use on child speech. Model evaluation was performed on a diverse collected child corpus based on their PCC-R score. Finally, a duration “shoulder” extension of SAD boundary labels was also analyzed to benchmark potential system impact on model performance.