Using Machine Learning to Predict the Future Development of Disease

2020 
The objective of this research is to develop a longterm risk model for the development of cardiovascular disease (CVD) because of type-2 diabetes (T2D). We use the support vector machine (SVM) and the K-nearest neighbours algorithms on the dataset collected from a longitudinal study called Framingham Heart Study, to develop the prediction models. The dataset was first balanced by the Synthetic Minority Oversampling Technique algorithm. The SVM algorithm was then used to train the model, and after tuning the parameters and training for 1000 times, the average accuracy to correctly predict the prevalence of CVD due to T2D came out as 96.5% and the average recall rate was 89.8%. Similarly, we also applied the KNN algorithm to train the dataset, and the recall rate even reaches 92.9%. The advantages of our model are: 1) it can predict with high accuracy both the risk of development of T2D and CVD simultaneously; 2) it can be used without the expensive and tedious oral glucose tolerance test. The model yielded high-performance results after training on the Framingham Heart Study dataset.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []