Predicting tongue motion in unlabeled ultrasound videos using convolutional LSTM neural network

Chaojie Zhao,Peng Zhang,Jian Zhu,Chengrui Wu,Huaimin Wang,Kele Xu

Predicting tongue motion in unlabeled ultrasound videos using convolutional LSTM neural network

2019

Chaojie Zhao
Peng Zhang
Jian Zhu
Chengrui Wu
Huaimin Wang
Kele Xu

A challenge in speech production research is to predict future tongue movements based on a short period of past tongue movements. This study tackles speaker-dependent tongue motion prediction problem in unlabeled ultrasound videos with convolutional long short-term memory (ConvLSTM) networks. The model has been tested on two different ultrasound corpora. ConvLSTM outperforms 3-dimensional convolutional neural network (3DCNN) in predicting the 9\textsuperscript{th} frames based on 8 preceding frames, and also demonstrates good capacity to predict only the tongue contours in future frames. Further tests reveal that ConvLSTM can also learn to predict tongue movements in more distant frames beyond the immediately following frames. Our codes are available at: this https URL.

Keywords:

Tongue
Artificial neural network
Convolutional neural network
Computer science
Artificial intelligence
Pattern recognition
Speech production
Ultrasound
motion prediction

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations