A Comparison of Update Strategies for Large-Scale Maximum Expected BLEU Training

Joern Wuebker,Sebastian Muehr,Patrick Lehnen,Stephan Peitz,Hermann Ney

A Comparison of Update Strategies for Large-Scale Maximum Expected BLEU Training

2015

Joern Wuebker
Sebastian Muehr
Patrick Lehnen
Stephan Peitz
Hermann Ney

This work presents a flexible and efficient discriminative training approach for statistical machine translation. We propose to use the RPROP algorithm for optimizing a maximum expected BLEU objective and experimentally compare it to several other updating schemes. It proves to be more efficient and effective than the previously proposed growth transformation technique and also yields better results than stochastic gradient descent and AdaGrad. We also report strong empirical results on two large scale tasks, namely BOLT Chinese!English and WMT German!English, where our final systems outperform results reported by Setiawan and Zhou (2013) and on matrix.statmt.org. On the WMT task, discriminative training is performed on the full training data of 4M sentence pairs, which is unsurpassed in the literature.

Keywords:

Machine learning
BLEU
Training set
Machine translation
Stochastic gradient descent
Artificial intelligence
Discriminative model
Sentence
Computer science
Rprop
growth transformation
rprop algorithm
Data mining

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations