Highly correlated feature set selection for data clustering

M.R. Sumalatha,M. Ananthi,A. Arvind,N. Navin,C. Siddarth

Highly correlated feature set selection for data clustering

2014

Feature set selection is the process of identifying a subset of features which produces the result same as the entire set. The feature set selection helps in clustering the datasets. In this paper, a Highly Correlated Feature set Selection (HCFS) algorithmis proposed for clustering the data. This algorithm helps in selecting features based on its relevancy and redundancy factors. All the selected features are finally clustered based on how they are correlated with each other. The main objective of this paper is to identify the feature subsets which will improve the classification performance by constructing minimum spanning tree (MST) between the features.The HCFS algorithm works in two steps. In the first step, the features are divided into clusters using the spanning tree construction process. In the second step, the cluster representatives are selected using Frequent Pattern Analysis (FPA) technique to form the effective feature set which reduces the time required for query evaluation process. The redundant and irrelevant features are removed based on their Symmetric Uncertainty (SU) values. This effectively improves the efficiency of data clustering process.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations