VFC-SMOTE: very fast continuous synthetic minority oversampling for evolving data streams

2021 
The world is constantly changing, and so are the massive amount of data produced. However, only a few studies deal with online class imbalance learning that combines the challenges of class-imbalanced data streams and concept drift. In this paper, we propose the very fast continuous synthetic minority oversampling technique (VFC-SMOTE). It is a novel meta-strategy to be prepended to any streaming machine learning classification algorithm aiming at oversampling the minority class using a new version of Smote and Borderline-Smote inspired by Data Sketching. We benchmarked VFC-SMOTE pipelines on synthetic and real data streams containing different concept drifts, imbalance levels, and class distributions. We bring statistical evidence that VFC-SMOTE pipelines learn models whose minority class performances are better than state-of-the-art. Moreover, we analyze the time/memory consumption and the concept drift recovery speed.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    0
    Citations
    NaN
    KQI
    []