Using Automated-Machine Learning to Predict COVID-19 Patient Mortality.

2021 
BACKGROUND: In a pandemic, it is important for clinicians to stratify patients and decide who receives limited medical resources. Machine learning (ML) models have been proposed to accurately predict COVID-19 disease severity. Previous studies have typically tested only one ML algorithm and limited performance evaluation to "area under the curve" (AUC). To obtain best results possible, it may be important to try multiple algorithms using machine learning (ML) to optimize performance. OBJECTIVE: In this study, we used automated machine learning (autoML) to train various machine learning (ML) algorithms. We selected the model that best predicted the chance of patient survival from COVID-19 infection. In addition, we investigated which variables (i.e. vital signs, biomarkers, comorbidities, etc.) were most influential in generating an accurate model. METHODS: The data was retrospectively collected at our institution on all patients testing positive for COVID-19 between 3/1/2020-7/3/2020. We collected 48 variables from each patient within 36 hours before or after the index time: RT-PCR positivity. Patients were followed up for 30 days or death. This data was used for autoML to build 20 ML models with various algorithms. The main performance of ML models was measured by area under the precision recall curve (AUCPR). Subsequently, we established model interpretability to identify and rank variables that drove model predictions using Shapley additive explanations (SHAP) and partial dependence plot (PDP). Finally, dimensionality reduction was conducted to extract the 10 most influential variables. AutoML was retrained using only these 10 variables and its output models was evaluated against the model that used 48 variables. RESULTS: Input from 4313 patients was used. The best model that autoML generated using 48 variables was the stacked ensemble models (AUCPR = 0.807). The two best independent models were the Gradient Boost Models (GBM) and Extreme Gradient Boost (XGBoost) models with AUCPR of 0.803 and 0.793, respectively. Deep learning models were significantly inferior with AUCPR = 0.73. The ten most influential variables in generating high performing models were systolic and diastolic blood pressure, age, pulse oximetry, blood urea nitrogen, lactate dehydrogenase, D-dimer, troponin, respiratory rate, and Charlson comorbidity score. When the autoML was retrained with these 10 variables, the stacked ensemble model again performed the best with AUCPR of 0.791. CONCLUSIONS: By using autoML, we have developed high-performing models that predict survival from COVID-19 infection. In addition, we identified important variables that correlated with mortality. This is proof of concept that autoML is an efficient, effective, and informative method to generate ML based clinical decision supporting tools.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    4
    Citations
    NaN
    KQI
    []