Emphasis Detection for Voice Dialogue Applications Using Multi-channel Convolutional Bidirectional Long Short-Term Memory Network

Long Zhang,Jia Jia,Fanbo Meng,Suping Zhou,Wei Chen,Cunjun Zhang,Runnan Li

Emphasis Detection for Voice Dialogue Applications Using Multi-channel Convolutional Bidirectional Long Short-Term Memory Network

2018

Emphasis detection is important for user intention understanding in human-computer interaction scenario. Techniques have been developed to detect the emphatic words in speech, but challenges still exist in Voice Dialogue Applications (VDAs): the tremendous non-specific speakers and their various expressions. In this work, we present a novel approach to automatically detect emphasis in VDAs by using multi-channel convolutional bi -directional long short-term memory neural networks (MC-BLSTM), which can learn various expressions of large amounts of speakers and long span temporal dependencies across speech trajectories. In particular, we first use a multi-channel convolutional component in the proposed approach to extract high-level representation of input acoustic features for emphasis detection. The experimental results on a 3400 real-world dataset collected from Sogou 1 1 http://yy.sogou.com Voice Assistant outper-form current state-of-the-art baseline systems (+6.2% in terms of F1-measure on average).

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations