\(F_{0}\) Modeling Using DNN for Arabic Parametric Speech Synthesis

2019 
Deep neural networks (DNN) are gaining increasing interest in speech processing applications, especially in text-to-speech synthesis. Actually state-of-the-art speech generation tools, like MERLIN and WAVENET are totally DNN-based. However, every language has to be modeled on its own using DNN. One of the key components of speech synthesis modules is the prosodic parameters generation module from contextual input features, and more particularly the fundamental frequency (\(F_{0}\)) generation module. Actually \(F_{0}\) is responsible for intonation, that is why it should be accurately modeled to provide intelligible and natural speech. However, \(F_{0}\) modeling is highly dependent on the language. Therefore, language specific characteristics have to be taken into account. In this paper, we aim to model \(F_{0}\) for Arabic speech synthesis with feedforward and recurrent DNN, and using specific characteristic features for Arabic like vowel quantity and gemination, in order to improve the quality of Arabic parametric speech synthesis.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []