An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features

2017 
Recently, methods to big data are gaining a growing number of popularity, as we are entering the age of big data. As a result, novel methods keep emerging, among which stands random forest method. Random forest fuses multiple sub decision trees for classification and regression, with high accuracy and generalization. It, however, has unsatisfactory performance when facing data sets with more noise and redundant features. This phenomenon is mainly caused by inaccuracy from some sub decision trees, and fusing all of them directly cannot decrease their negative effect. Therefore, we proposed advanced random forest to assign less probability to those negative sub decision trees, meaning they are less likely to be chosen at fusion process. Thus, the capability of prediction is improved. Dropout and roulette method we used in the process ensures a good generalization capability, and maintains a higher accuracy simultaneously. We sample the original data set following the method of K-fold division which will increase the differences between sub decision trees, making the prediction more credible. Finally, our proposed method is validated on several data sets. Experimental results show that compared to traditional random forest method, our method has higher classification accuracy on data sets with noise and data sets with more redundant features.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    2
    Citations
    NaN
    KQI
    []