Imbalanced Dataset Optimization with New Resampling Techniques

2021 
Imbalanced datasets, which are very common in many application fields, represent a formidable problem for most of the machine learning algorithms. On the other hand, such algorithms are being extensively applied in many areas, showing promising results and outperforming other approaches. Therefore, many techniques have been developed to re-balance datasets in order to improve machine learning applicability. In this paper we present two algorithms, called G1No (Generative resampling 1-nearest Neighbour) and G1No Gourmet, which compensate dataset imbalance by generating synthetic samples, and compare them with two state-of-the-art re-balancing algorithms, namely the Synthetic Minority Oversampling (SMOTE) and the ADAptive SYNthetic sampling (ADASYN). The experiments, carried on a realistic malware traffic dataset, namely, the MTA-KDD’19, show that G1No outperforms the other algorithms and is even able to improve the quality of the original dataset.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []