Can We Learn What People Are Doing From Raw DNS Queries?

Authors:
Jianfeng Li Xi'an Jiaotong University, P.R. China
Xiaobo Ma Xi'an Jiaotong University & The Hong Kong Polytechnic University, P.R. China
Li Guodong Xi'an Jiaotong University, P.R. China
Xiapu Luo The Hong Kong Polytechnic University, Hong Kong
Junjie Zhang Wright State University, USA
Wei Li Xi'an JiaoTong University, P.R. China
Xiaohong Guan Xi’an Jiaotong University & Tsinghua University, P.R. China

Abstract:

Domain Name System (DNS) is one of the pillars of today's Internet. Due to its appealing properties such as low data volume, wide-ranging applications and encryption free, DNS traffic has been extensively utilized for network monitoring. Most existing studies of DNS traffic, however, focus on domain name reputation. Little attention has been paid to understanding and profiling what people are doing from DNS traffic, a fundamental problem in the areas including Internet demographics and network behavior analysis. Consequently, simple questions like "How to determine whether a DNS query for www.google.com means searching or any other behaviors?" cannot be answered by existing studies. In this paper, we take the first step to identify user activities from raw DNS queries. We advance a multi-scale hierarchical framework to tackle two practical challenges, i.e., behavior ambiguity and behavior polymorphism. Under this framework, a series of novel methods, such as pattern upward mapping and multi-scale random forest classifier, are proposed to characterize and identify user activities of interest. Evaluation using both synthetic and real-world DNS traces demonstrates the effectiveness of our method.

You may want to know: