Significance of sonority information for voiced/unvoiced decision in speech synthesis

Abstract The quality of synthesized speech obtained from statistical parametric speech synthesis (SPSS) significantly relies on excitation source generation. Voiced/unvoiced decision is an essential component for generation of excitation source. It is obtained from fundamental frequency and other excitation source evidence in the existing literature. The discontinuity at the point of contact in the vocal-folds excites energy into the vocal-tract resulting voicing effect in the produced speech signal. The perceptual reflection of voicing over the sound produced is correlated with the sonority information which is related to less vocal-tract constriction and significant glottal vibration. Therefore, the possible variation in voicing with the change in supraglottal pressure due to vocal-tract constriction, rate of closing of vocal folds and regularity in structure of the signal are intact in the sonority associated with a sound unit. Voicing and degree of opening of vocal-tract are the two most effective correlates of sonority, that potentially contribute to the sonority hierarchy for sonorants and obstruents uniformly. Therefore, the voicing effect can be captured by the sonority measurement derived from system, source and suprasegmental information in the speech signal. In this work, a novel voiced/unvoiced decision method using sonority information is proposed and integrated in the SPSS framework for generation of excitation source. It leads to better voicing decision compared to the existing methods resulting in synthesized speech of improved quality, which is assured from objective and subjective analysis.
