A Fast Distance-Based Outlier Detection Technique Using a Divisive Hierarchical Clustering Algorithm

Xiaochun Wang,Xiali Wang,Mitch Wilkes

A Fast Distance-Based Outlier Detection Technique Using a Divisive Hierarchical Clustering Algorithm

2021

Today’s real-world databases typically have millions of items with many thousands of fields, resulting in data that range in size into terabytes. As a result, traditional distribution-based outlier detection techniques have more and more restricted capabilities and novel approaches that find unusual samples in a data set based on their distances to neighboring samples have become more and more popular. The problem with these k-nearest neighbor-based methods is that they are computationally expensive for large datasets. At the same time, today’s databases are often too large to fit into the main memory at once. As a result, memory capacity and, correspondingly, I/O cost, become an important issue. In this chapter, we present a simple distance-based outlier detection algorithm that can compete with existing solutions in both CPU and I/O efficiency.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations