Exploring attention mechanisms based on summary information for end-to-end automatic speech recognition

2021 
Abstract Recent studies have confirmed that attention mechanisms with location constraint strategy are helpful to reduce the misrecognition caused by incorrect alignments in attention-based end-to-end automatic speech recognition (E2E ASR) systems. The significant advantage of these mechanisms is that they consider the monotonicity of the alignment by employing a location constraint vector. This vector is directly obtained from historical attention scores for most such attention mechanisms. However, an unreasonable vector may become an additional interference when an inaccurate historical attention score occurs. Moreover, the subsequent process of attention scoring will be affected by the interference continuously. To address the problem, we obtain a reasonable location constraint vector from the matching relationship between the historical output information and the summary information, where the summary information includes content and temporal information about speech sequence. We further propose an enhanced location constrained attention mechanism, i.e., summary constrained (SC) attention mechanism, to generate the vector by a matching relationship-based neural network. We use a summary subspace embedding learned by a linear subspace projection to represent the summary information. Furthermore, considering the complementarity of the SC and typical location constrained attention mechanisms, a fused attention mechanism is used to generate a more reasonable vector by combining the two mechanisms. The SC and fused attention mechanisms-based E2E ASR systems were evaluated on a Switchboard conversational telephone speech recognition. The experimental results show that our mechanisms obtained the relative reductions of 10.6 % and 16.7 % in the word error rate compared with the baseline system.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    53
    References
    0
    Citations
    NaN
    KQI
    []