Gated Residual Connection for Nerual Machine Translation

2019 
The Transformer framework has shown its flexibility in parallel computation and the effectiveness of modeling word dependencies since it is proposed. Due to the exploding and vanishing gradient problem, Transformer adopted the residual connection among the layers in a stack. However, normally residual connection layer just simply element-added the input and output. In this work, we focus on improving the residual connection layer through regulating the appropriate probability of the input and output flow but not only simply added. To maintain the simplicity and flexibility of the Transformer framework, we leverage the internal representations of the output. The results of experiments on WMT14 English-German translation dataset demonstrate the effectiveness of our proposed method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []