Semi-Supervised Neural Machine Translation via Marginal Distribution Estimation

2019 
Neural machine translation (NMT) heavily relies on parallel bilingual corpora for training. Since large-scale, high-quality parallel corpora are usually costly to collect, it is appealing to exploit monolingual corpora to improve NMT. Inspired by the law of total probability, which connects the probability of a given target-side monolingual sentence to the conditional probability of translating from a source sentence to the target one, we propose to explicitly exploit this connection and help the training procedure of NMT models using monolingual data. The key technical challenge of this approach is that there are exponentially many source sentences for a target monolingual sentence while computing the sum of the conditional probability given each possible source sentence. We address this challenge by leveraging the reverse translation model (target-to-source translation model) to sample several mostly likely source-side sentences and avoid enumerating all possible candidate source sentences. Then we propose two different methods to leverage the law of total probability, including marginal distribution regularization and likelihood maximization of monolingual corpora. Experiment results on English $\rightarrow$ French and German $\rightarrow$ English tasks demonstrate that our methods achieve significant improvement over several strong baselines.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    4
    Citations
    NaN
    KQI
    []