Big Data Analytics: Schizophrenia Prediction on Apache Spark.

2020 
Nowadays, the size of the dataset collected from medical record data increases dramatically, ranging from patient demographic data, clinical records, patient symptoms, and nursing diagnoses. No exception to the medical record data on schizophrenia patients, as revealed by WHO that the schizophrenia reaches 20 million in 2019, but the available data can be used for pre-diagnostic tasks regarding schizophrenia cases by adopting the concept of big data analytic. The main objective of this study is to design a prediction model to predict the type of schizophrenia (Paranoid, Catatonic, Residual, Hebephrenic, Symplex, and Undifferentiated) from the medical record dataset of schizophrenic patients. The dataset is then used in a comparative experiment with five machine learning classification algorithms, which are Artificial Neural Network (ANN), Random Forest (RF), Naive Bayes (NB), Logistic Regression (LR), and Decision Tree (DT) under the apache spark system on MLlib package. Optimization experiments were also carried out through the L-BGS optimizer for ANN, and 10-folded Cross-validation for four other classification algorithms to obtain optimal results. The best results for the schizophrenia case prediction model were achieved by Random Forest by outperforming five other classification algorithms, with an accuracy of 0.93, a precision of 0.93, a recall of 0.93, and F1-measure of 0.92. This is followed by the performance of ANN, DT, LR, and NB
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []