Data Augmentation for Heart Arrhythmia Classification

2020 
In this paper, we introduce a technique for data augmentation that has been applied to an ECG dataset from the UKBiobank for heart arrhythmia classification using the XGBoost algorithm. In the majority of clinical datasets, the number of participants with a disease (positive samples) is considerably lower than the number of healthy participants (negative samples). Hence, when it comes to using the data in machine learning, there are not enough cases of the diseased participants for the algorithm to train a model. We have developed techniques to overcome this limitation by up-sampling the positive cases. To validate our technique we have evaluated its reliability by comparing the augmented data set with the original data distribution using the Wilcoxon signed rank statistical significance test. We have also compared the results with and without data augmentation on the XGBoost classifier, and have used the AUC (area under the curve) and the Cohen's Kappa as the evaluation metrics. In our results, the AUC improved from 0.58 without augmentation to 0.83 with augmentation and the Cohen's kappa improved from 0 to 0.76. Our metrics values show the agreement is substantial. These techniques can be used on any other data and are not limited to clinical studies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    2
    Citations
    NaN
    KQI
    []