GoChat: Goal-oriented Chatbots with Hierarchical Reinforcement Learning

2020 
A chatbot that converses like a human should be goal-oriented (i.e., be purposeful in conversation), which is beyond language generation. However, existing goal-oriented dialogue systems often heavily rely on cumbersome hand-crafted rules or costly labelled datasets, which limits the applicability. In this paper, we propose Goal-oriented Chatbots (GoChat), a framework for end-to-end training the chatbot to maximize the long-term return from offline multi-turn dialogue datasets. Our framework utilizes hierarchical reinforcement learning (HRL), where the high-level policy determines some sub-goals to guide the conversation towards the final goal, and the low-level policy fulfills the sub-goals by generating the corresponding utterance for response. In our experiments conducted on a real-world dialogue dataset for anti-fraud in financial, our approach outperforms previous methods on both the quality of response generation as well as the success rate of accomplishing the goal.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []