|Haolan Chen||Mobile Internet Group, Tencent|
|Fred X. Han||University of Alberta|
|Di Niu||University of Alberta|
|Dong Liu||Mobile Internet Group, Tencent|
|Kunfeng Lai||Mobile Internet Group, Tencent|
|Chenglin Wu||Mobile Internet Group, Tencent|
|Yu Xu||Mobile Internet Group, Tencent|
This paper studies Short Text Matching. The authors present the design of Multi-Channel Information Crossing , a multi-channel convolutional neural network model for text matching, with additional attention mechanisms from sentence and text semantics.
Short Text Matching plays an important role in many natural language processing tasks such as information retrieval, question answering, and conversational system. Conventional text matching methods rely on predefined templates and rules, which are not applicable to short text with limited numebr of words and limit their ability to generalize to unobserved data. Many recent efforts have been made to apply deep neural network models to natural language processing tasks, which reduces the cost of feature engineering. In this paper, we present the design of Multi-Channel Information Crossing , a multi-channel convolutional neural network model for text matching, with additional attention mechanisms from sentence and text semantics. MIX compares text snippets at varied granularities to form a series of multi-channel similarity matrices, which are crossed with another set of carefully designed attention matrices to expose the rich structures of sentences to deep neural networks. We implemented MIX and deployed the system on Tencent’s Venus distributed computation platform. Thanks to carefully engineered multi-channel information crossing, evaluation results suggest that MIX outperforms a wide range of state-of-the-art deep neural network models by at least 11.1% in terms of the normalized discounted cumulative gain, on the English WikiQA dataset. Moreover, we also performed online A/B tests with real users on the search service of Tencent QQ Browser. Results suggest that MIX raised the number of clicks on the returned results by 5.7%, due to an increased accuracy in query-document matching, which demonstrates the superior performance of MIX in production environments.