|Pinghui Wang||Xi'an Jiaotong University, P.R. China|
|Peng Jia||Xi'an Jiaotong University, P.R. China|
|Jing Tao||Xi'an Jiaotong University, P.R. China|
|Xiaohong Guan||Xi’an Jiaotong University & Tsinghua University, P.R. China|
Mining user behaviors over high speed links is important for applications such as network anomaly detection. Previous work focuses on monitoring anomalies such as extremely frequent users occurring in a short timeslot such as 1 minute. Little attention has been paid to detect users with stealthy behaviors such as persistent frequent and co-occurrence behaviors over a long period of time at the timeslot granularity (e.g., 1 minute granularity level). Unlike frequent users, persistent users do not necessarily occur more frequently than other users in a single timeslot, but persist and occur in a larger number of timeslots. Due to limited computation and storage resources on routers, it is prohibitive to collect massive network traffic in a long period of time. We develop an end-to-end method for solving challenges in both long-term online traffic collection and offline user behavior analysis. To achieve this goal, we design a user embedding (UE) method to fast build compact sketches of user-occurrence events over time. To reduce the estimation error introduced by Bloom Filter, we model UE as a sampling method and propose methods to accurately mine a variety of user behaviors from user-occurrence events rebuilt from UE sketches. In addition, we introduce another new embedding method reversible UE (RUE) to detect persistent frequent behaviors when monitored users' IDs are not given in advance for offline analysis. We conduct extensive experiments on real-world traffic, and the results demonstrate that our methods significantly outperform state-of-the-art methods.