Hybrid Attention based Multimodal Network for Spoken Language Classification.

Yue Gu,Kangning Yang,Shiyu Fu,Shuhong Chen,Xinyu Li,Ivan Marsic

Hybrid Attention based Multimodal Network for Spoken Language Classification.

2018

: We examine the utility of linguistic content and vocal characteristics for multimodal deep learning in human spoken language understanding. We present a deep multimodal network with both feature attention and modality attention to classify utterance-level speech data. The proposed hybrid attention architecture helps the system focus on learning informative representations for both modality-specific feature extraction and model fusion. The experimental results show that our system achieves state-of-the-art or competitive results on three published multimodal datasets. We also demonstrated the effectiveness and generalization of our system on a medical speech dataset from an actual trauma scenario. Furthermore, we provided a detailed comparison and analysis of traditional approaches and deep learning methods on both feature extraction and fusion.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations