A new multi-layer clustering ensemble framework based on different closeness measures

2017 
Topics on clustering ensemble have attracted much attention in recent years. In many clustering ensemble frameworks, the simple partitional clustering methods, e.g., the most famous κ-means, are used as the ensemble's member “clusterers”, due to their low computational complexity. These ensemble approaches extend the scope of application of individual clustering algorithms, and improve the robustness of the final clustering results. However, by applying the ensemble approaches, many problems of clustering algorithms still cannot be settled. For example, the clustering ensemble based on κ-means might still not able to effectively deal with the clustering tasks with arbitrary clusters' shapes or imbalanced clusters's sizes. This problem is caused by the geometric-distance-based closeness measures (e.g. the Euclidean distance) used in the member clusterers. In this paper, we propose a multi-layer clustering ensemble framework where different kinds of closeness measures are used for different data points in one clustering task. In this framework, the data points which are hard to deal with (called the “bad data points”, e.g., the data points in the “overlapping” region of two non-spherical shaped clusters) are identified based on the outputs of a group of member clusterers, and the closeness between these bad data points will be calculated with a non-geometric-distance-based closeness measure called C M NC. In this way, the new framework can counter-act the drawbacks of the traditional ensemble approaches based on partitional clustering algorithms, and at the same time, it partially retains their merits of low computational cost.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    0
    Citations
    NaN
    KQI
    []