Resolving Abstract Anaphora In Conversational Assistants Using A Hierarchically-stacked RNN

Authors:
Prerna Khurana Tata Consultancy Services
Puneet Agarwal Tata Consultancy Services
Gautam Shroff Tata Consultancy Services
Lovekesh Vig Tata Consultancy Services

Introduction:

This paper studies conversational systems . The authors propose a novel solution which uses hierarchical neural network, comprising of BiLSTM layer and a maxpool layer that is hierarchically stacked to obtain a representation of each user utterance and then to obtain a representation for sequence of utterances.

Abstract:

Recent proliferation of conversational systems has resulted in an increased demand for more natural dialogue systems, capable of more sophisticated interactions than merely providing factual answers. This is evident from usage pattern of a conversational system deployed within our organization. Users not only expect it to perform co-reference resolution of anaphora, but also of the antecedent or posterior facts presented by users with respect to their query. Presence of such facts in a conversation sometimes modifies the answer of main query, e.g., answer to ‘how many sick leave do I get?’ would be different when a fact ‘I am on contract’ is also present. Sometimes there is a need to collectively resolve three or four such facts. In this paper, we propose a novel solution which uses hierarchical neural network, comprising of BiLSTM layer and a maxpool layer that is hierarchically stacked to first obtain a representation of each user utterance and then to obtain a representation for sequence of utterances. This representation is used to identify users’ intention. We also improvise this model by using skip connections in the second network to allow better gradient flow. Our model, not only a)~resolves the antecedent and posterior facts, but also b)~performs better even on self-contained queries. It is also c)~faster to train, making it the most promising approach for use in our environment where frequent training and tuning is needed. It slightly outperforms the benchmark on a publicly available dataset, and e)~performs better than obvious baselines approaches on our datasets.

You may want to know: