Highly correlated feature set selection for data clustering

2014 
Feature set selection is the process of identifying a subset of features which produces the result same as the entire set. The feature set selection helps in clustering the datasets. In this paper, a Highly Correlated Feature set Selection (HCFS) algorithmis proposed for clustering the data. This algorithm helps in selecting features based on its relevancy and redundancy factors. All the selected features are finally clustered based on how they are correlated with each other. The main objective of this paper is to identify the feature subsets which will improve the classification performance by constructing minimum spanning tree (MST) between the features.The HCFS algorithm works in two steps. In the first step, the features are divided into clusters using the spanning tree construction process. In the second step, the cluster representatives are selected using Frequent Pattern Analysis (FPA) technique to form the effective feature set which reduces the time required for query evaluation process. The redundant and irrelevant features are removed based on their Symmetric Uncertainty (SU) values. This effectively improves the efficiency of data clustering process.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    1
    Citations
    NaN
    KQI
    []