Semi-Random Forest Based on Representative Patterns for Noisy and Non-Stationary Data Stream

2019 
Noise often exists in the data stream, and the data distribution may change as time evolves, that is, concept drift, which makes the previous decision boundary of classifier is no longer suitable to new data, resulting in poor performance. To deal with these issues, this paper proposes a pattern-based classifier named Closed Frequent Pattern based Semi-Random Forest (CFPSRF), which adopts closed frequent patterns for the representation of the raw data to remove redundant information and noise. Meanwhile, a change measure for pattern sets is proposed, which measures the magnitude of distribution change by the mined patterns to determine whether the classifier needs to be updated. To evaluate the performance of CFPSRF, we perform experiments using real-world datasets and synthetic datasets respectively under MOA. The experimental results show that our method outperforms the related algorithms used for comparison in average classification accuracy, and can deal with the issues of concept drift and noise effectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    0
    Citations
    NaN
    KQI
    []