A new multi-layer clustering ensemble framework based on different closeness measures
2017
Topics on clustering ensemble have attracted much attention in recent years. In many clustering ensemble frameworks, the simple partitional clustering methods, e.g., the most famous κ-means, are used as the ensemble's member “clusterers”, due to their low computational complexity. These ensemble approaches extend the scope of application of individual clustering algorithms, and improve the robustness of the final clustering results. However, by applying the ensemble approaches, many problems of clustering algorithms still cannot be settled. For example, the clustering ensemble based on κ-means might still not able to effectively deal with the clustering tasks with arbitrary clusters' shapes or imbalanced clusters's sizes. This problem is caused by the geometric-distance-based closeness measures (e.g. the Euclidean distance) used in the member clusterers. In this paper, we propose a multi-layer clustering ensemble framework where different kinds of closeness measures are used for different data points in one clustering task. In this framework, the data points which are hard to deal with (called the “bad data points”, e.g., the data points in the “overlapping” region of two non-spherical shaped clusters) are identified based on the outputs of a group of member clusterers, and the closeness between these bad data points will be calculated with a non-geometric-distance-based closeness measure called C M NC. In this way, the new framework can counter-act the drawbacks of the traditional ensemble approaches based on partitional clustering algorithms, and at the same time, it partially retains their merits of low computational cost.
Keywords:
- Machine learning
- k-medians clustering
- Correlation clustering
- Cluster analysis
- Single-linkage clustering
- Artificial intelligence
- FLAME clustering
- Canopy clustering algorithm
- CURE data clustering algorithm
- Mathematics
- Brown clustering
- Pattern recognition
- Computer science
- Data mining
- Data stream clustering
- Fuzzy clustering
- Consensus clustering
- Constrained clustering
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
22
References
0
Citations
NaN
KQI