Counterfactual Off-Policy Training for Neural Response Generation.

Qingfu Zhu,Wei-Nan Zhang,Ting Liu,William Yang Wang

Counterfactual Off-Policy Training for Neural Response Generation.

2020

Qingfu Zhu
Wei-Nan Zhang
Ting Liu
William Yang Wang

Learning a neural response generation model on data synthesized under the adversarial training framework helps to explore more possible responses. However, most of the data synthesized de novo are of low quality due to the vast size of the response space. In this paper, we propose a counterfactual off-policy method to learn on a better synthesis of data. It takes advantage of a real response to infer an alternative that was not taken using a structural casual model. Learning on the counterfactual responses helps to explore the high-reward area of the response space. An empirical study on the DailyDialog dataset shows that our approach significantly outperforms the HRED model as well as the conventional adversarial training approaches.

Keywords:

Machine learning
Adversarial system
Empirical research
response generation
Counterfactual thinking
Casual
Artificial intelligence
Mathematics

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations