A Fast Distance-Based Outlier Detection Technique Using a Divisive Hierarchical Clustering Algorithm

2021 
Today’s real-world databases typically have millions of items with many thousands of fields, resulting in data that range in size into terabytes. As a result, traditional distribution-based outlier detection techniques have more and more restricted capabilities and novel approaches that find unusual samples in a data set based on their distances to neighboring samples have become more and more popular. The problem with these k-nearest neighbor-based methods is that they are computationally expensive for large datasets. At the same time, today’s databases are often too large to fit into the main memory at once. As a result, memory capacity and, correspondingly, I/O cost, become an important issue. In this chapter, we present a simple distance-based outlier detection algorithm that can compete with existing solutions in both CPU and I/O efficiency.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    0
    Citations
    NaN
    KQI
    []