Synthesizing 3D Acoustic-Articulatory Mapping Trajectories: Predicting Articulatory Movements by Long-Term Recurrent Convolutional Neural Network

2018 
Robust and accurate predicting of articulatory movements has various important applications, such as 3D articulatory animations and visual communication. Various approaches have been proposed to solve the acoustic-articulatory mapping problem. However, their precision is not high enough. Recently, deep neural network (DNN), especially convolutional neural network (CNN) and recurrent neural network (RNN), has brought tremendous success in speech recognition and synthesis. To increase the accuracy, we propose a new network architecture for acoustic-articulatory mapping, called long-term recurrent convolutional neural network (LTRCNN). The network consists of CNN, RNN and a skip connection. CNN can model the spectral correlation among acoustic features efficiently. RNN, like long short-term memory (LSTM), can learn the temporal context information from sequential data powerfully. Besides, skip connections can increase the input representation from different levels to preserve the feature information. Experiments show that LTRCNN achieves the state-of-the-art root-mean-squared error (RMSE) with 0.690 mm and the correlation coefficient with 0.949 in this prediction task.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    3
    Citations
    NaN
    KQI
    []