Layer-Wise Coordination Between Encoder And Decoder For Neural Machine Translation

Authors:
Tianyu He University of Science and Technology of China
Xu Tan Microsoft Research
Yingce Xia Microsoft Research
Di He Peking University
Tao Qin Microsoft Research
Zhibo Chen University of Science and Technology of China
Tie-Yan Liu Microsoft Research Asia

Introduction:

Neural Machine Translation (NMT) has achieved remarkable progress with the quick evolvement of model structures.In this paper, the authors propose the concept of layer-wise coordination for NMT, which explicitly coordinates the learning of hidden representations of the encoder and decoder together layer by layer, gradually from low level to high level.

Abstract:

Neural Machine Translation (NMT) has achieved remarkable progress with the quick evolvement of model structures. In this paper, we propose the concept of layer-wise coordination for NMT, which explicitly coordinates the learning of hidden representations of the encoder and decoder together layer by layer, gradually from low level to high level. Specifically, we design a layer-wise attention and mixed attention mechanism, and further share the parameters of each layer between the encoder and decoder to regularize and coordinate the learning. Experiments show that combined with the state-of-the-art Transformer model, layer-wise coordination achieves improvements on three IWSLT and two WMT translation tasks. More specifically, our method achieves 34.43 and 29.01 BLEU score on WMT16 English-Romanian and WMT14 English-German tasks, outperforming the Transformer baseline.

You may want to know: