K-means Clustering Based Undersampling for Lower Back Pain Data

Qian Zhou,Bo Sun,Yunsheng Song,Shuang Li

K-means Clustering Based Undersampling for Lower Back Pain Data

2020

Many people are usually suffered from low back pain(LBP). It is very important to identify the LBP in the early stage. The classification algorithm in machine learning can help us to predict whether a person is suffered from low back pain, but class imbalance is often a problem in various real-world datasets including the LBP dataset. In this paper, LBP diagnosis based on a k-means clustering combined with undersampling has been proposed. The first strategy is to combine k-means and stratified random sampling to undersample(KSS). The second strategy is to combine k-means and Manhattan distance to undersample(KMD). Experiments have been conducted on LBP dataset by classification systems. The performance of the method is evaluated using the area under curve(AUC) metric. The results show that the highest classification accuracy (0.92) is obtained for the KSS is combined with logistic regression on LBP dataset. The KSS combine with linear SVM has higher accuracy and stability.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations