HeavyGuardian: Separate And Guard Hot Items In Data Streams

Authors:
Tong Yang Peking University
Junzhi Gong Peking University
Haowei Zhang Peking University
Lei Zou Peking University
Lei Shi SKLCS, Institute of Software, Chinese Academy of Sciences
Xiaoming Li Peking University

Introduction:

Data stream processing is a fundamental issue in many fields. the authors propose a novel data structure named HeavyGuardian.

Abstract:

Data stream processing is a fundamental issue in many fields, such as data mining, databases, network traffic measurement. There are five typical tasks in data stream processing: frequency estimation, heavy hitter detection, heavy change detection, frequency distribution estimation, and entropy estimation. Different algorithms are proposed for different tasks, but they seldom achieve high accuracy and high speed at the same time. To address this issue, we propose a novel data structure named HeavyGuardian. The key idea is to intelligently separate and guard the information of hot items while approximately record the frequencies of cold items. We deploy HeavyGuardian on the above five typical tasks. Extensive experimental results show that HeavyGuardian achieves both much higher accuracy and higher speed than the state-of-the-art solutions for each of the five typical tasks. The source codes of HeavyGuardian and other related algorithms are available at GitHub.

You may want to know: