|Jinfeng Rao||University of Maryland|
|Ferhan Ture||Comcast Applied AI Research Lab|
|Jimmy Lin||University of Waterloo|
This paper tackles the challenge of understanding voice queries posed against the Comcast Xfinity X1 entertainment platform.The authors present a novel multi-task neural architecture that jointly learns to accomplish all three tasks.
We tackle the challenge of understanding voice queries posed against the Comcast Xfinity X1 entertainment platform, where consumers direct speech input at their “voice remotes”. Such queries range from specific program navigation (i.e., watch a movie) to requests with vague intents and even queries that have nothing to do with watching TV. We present successively richer neural network architectures to tackle this challenge based on two key insights: The first is that session context can be exploited to disambiguate queries and recover from ASR errors, which we operationalize with hierarchical recurrent neural networks. The second insight is that query understanding requires evidence integration across multiple related tasks, which we identify as program prediction, intent classification, and query tagging. We present a novel multi-task neural architecture that jointly learns to accomplish all three tasks. Our initial model, already deployed in production, serves millions of queries daily with an improved customer experience. The novel multi-task learning model, first described here, is evaluated through carefully-controlled laboratory experiments, which demonstrates further gains in effectiveness and increased system capabilities.