Classification Application Based on Mutual Information and Random Forest Method for High Dimensional Data

2017 
Random Forest (RF) has been widely used in the classification of high dimensional data. However, all the features of high dimensional data are used for classification, which will increase the computation time and reduce the classification accuracy. Therefore, feature selection is critical to high dimensional data classification. In order to solve this problem, this paper presents a method of Conditional Mutual Information (CMI) and Random Forest (CMI-RF). CMI is used to remove irrelevant and redundant information. The optimal subset of features with higher classification accuracy is obtained by RF. In this paper, the high dimensional near infrared spectral data is taken as experimental data. The experimental results demonstrate that CMI-RF method can select the feature subset with stronger correlation, no redundancy and high classification accuracy.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    1
    Citations
    NaN
    KQI
    []