|Gyoung S. Na||Pohang University of Science and Technology (POSTECH|
|Donghyun Kim||Pohang University of Science and Technology (POSTECH|
|Hwanjo Yu||Pohang University of Science and Technology (POSTECH|
This paper studies outlier detection algorithm called Local Outlier Factor (LOF). The authors propose a new outlier detection algorithm for data streams, called DILOF that effectively overcomes the limitations of LOF.
With precipitously growing demand to detect outliers in data streams, many studies have been conducted aiming to develop extensions of well-known outlier detection algorithm called Local Outlier Factor (LOF), for data streams. However, existing LOF-based algorithms for data streams still suffer from two inherent limitations: 1) Large amount of memory space is required. 2) A long sequence of outliers is not detected. In this paper, we propose a new outlier detection algorithm for data streams, called DILOF that effectively overcomes the limitations. To this end, we first develop a novel density-based sampling algorithm to summarize past data and then propose a new strategy for detecting a sequence of outliers. It is worth noting that our proposing algorithms do not require any prior knowledge or assumptions on data distribution. Moreover, we accelerate the execution time of DILOF about 15 times by developing a powerful distance approximation technique. Our comprehensive experiments on real-world datasets demonstrate that DILOF significantly outperforms the state-of-the-art competitors in terms of accuracy and execution time. The source code for the proposed algorithm is available at our website: http://di.postech.ac.kr/DILOF.