Emphasis Detection for Voice Dialogue Applications Using Multi-channel Convolutional Bidirectional Long Short-Term Memory Network

2018 
Emphasis detection is important for user intention understanding in human-computer interaction scenario. Techniques have been developed to detect the emphatic words in speech, but challenges still exist in Voice Dialogue Applications (VDAs): the tremendous non-specific speakers and their various expressions. In this work, we present a novel approach to automatically detect emphasis in VDAs by using multi-channel convolutional bi -directional long short-term memory neural networks (MC-BLSTM), which can learn various expressions of large amounts of speakers and long span temporal dependencies across speech trajectories. In particular, we first use a multi-channel convolutional component in the proposed approach to extract high-level representation of input acoustic features for emphasis detection. The experimental results on a 3400 real-world dataset collected from Sogou 1 1 http://yy.sogou.com Voice Assistant outper-form current state-of-the-art baseline systems (+6.2% in terms of F1-measure on average).
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    32
    References
    5
    Citations
    NaN
    KQI
    []