|Jianfeng Li||Xi'an Jiaotong University, P.R. China|
|Xiaobo Ma||Xi'an Jiaotong University & The Hong Kong Polytechnic University, P.R. China|
|Li Guodong||Xi'an Jiaotong University, P.R. China|
|Xiapu Luo||The Hong Kong Polytechnic University, Hong Kong|
|Junjie Zhang||Wright State University, USA|
|Wei Li||Xi'an JiaoTong University, P.R. China|
|Xiaohong Guan||Xi’an Jiaotong University & Tsinghua University, P.R. China|
Domain Name System (DNS) is one of the pillars of today's Internet. Due to its appealing properties such as low data volume, wide-ranging applications and encryption free, DNS traffic has been extensively utilized for network monitoring. Most existing studies of DNS traffic, however, focus on domain name reputation. Little attention has been paid to understanding and profiling what people are doing from DNS traffic, a fundamental problem in the areas including Internet demographics and network behavior analysis. Consequently, simple questions like "How to determine whether a DNS query for www.google.com means searching or any other behaviors?" cannot be answered by existing studies. In this paper, we take the first step to identify user activities from raw DNS queries. We advance a multi-scale hierarchical framework to tackle two practical challenges, i.e., behavior ambiguity and behavior polymorphism. Under this framework, a series of novel methods, such as pattern upward mapping and multi-scale random forest classifier, are proposed to characterize and identify user activities of interest. Evaluation using both synthetic and real-world DNS traces demonstrates the effectiveness of our method.