K-means Clustering Based Undersampling for Lower Back Pain Data

2020 
Many people are usually suffered from low back pain(LBP). It is very important to identify the LBP in the early stage. The classification algorithm in machine learning can help us to predict whether a person is suffered from low back pain, but class imbalance is often a problem in various real-world datasets including the LBP dataset. In this paper, LBP diagnosis based on a k-means clustering combined with undersampling has been proposed. The first strategy is to combine k-means and stratified random sampling to undersample(KSS). The second strategy is to combine k-means and Manhattan distance to undersample(KMD). Experiments have been conducted on LBP dataset by classification systems. The performance of the method is evaluated using the area under curve(AUC) metric. The results show that the highest classification accuracy (0.92) is obtained for the KSS is combined with logistic regression on LBP dataset. The KSS combine with linear SVM has higher accuracy and stability.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    10
    References
    1
    Citations
    NaN
    KQI
    []