Determining a minimum set of variables for machine learning cardiovascular event prediction: results from REFINE SPECT registry.

2021 
Aims Optimal risk stratification with machine learning (ML) from myocardial perfusion imaging (MPI) includes both clinical and imaging data. While most imaging variables can be derived automatically, clinical variables require manual collection, which is time consuming and prone to error. We determined the fewest manually input and imaging variables required to maintain the prognostic accuracy for major adverse cardiac events (MACE) in patients undergoing single-photon emission computed tomography (SPECT) MPI. Methods and results This study included 20,414 patients from the multicenter REFINE SPECT registry and 2,984 from the University of Calgary for training and external testing of the ML models, respectively. ML models were trained using all variables (ML-All) and all image-derived variables (including age and sex, ML-Image). Next, ML models were sequentially trained by incrementally adding manually input and imaging variables to baseline ML models based on their importance ranking. The fewest variables were determined as the ML models (ML-Reduced, ML-Minimum, and ML-Image-Reduced) that achieved comparable prognostic performance to ML-All and ML-Image. Prognostic accuracy of the ML models was compared with visual diagnosis, stress total perfusion deficit (TPD), and traditional multivariable models using area under the receiver-operating characteristic curve (AUC).ML-Minimum (AUC 0.798) obtained comparable prognostic accuracy to ML-All (AUC 0.798, p = 0.18) by including 12 of 40 manually input variables and 11 of 58 imaging variables. ML-Reduced achieved comparable accuracy (AUC 0.795) with a reduced set of manually input variables and all imaging variables. In external validation, the ML models also obtained comparable or higher prognostic accuracy than traditional multivariable models. Conclusion Reduced ML models, including a minimum set of manually collected or imaging variables, achieved slightly lower accuracy compared to a full ML model, but outperformed standard interpretation methods and risk models. ML models with fewer collected variables may be more practical for clinical implementation. Translational perspective A reduced machine learning model, with 12 out of 40 manually collected variables and 11 of 58 imaging variables, achieved >99% of the prognostic accuracy of the full model. Models with fewer manually collected features require less infrastructure to implement, are easier for physicians to utilize, and are potentially critical to ensuring broader clinical implementation. Additionally, these models can integrate mechanisms to explain patient-specific risk estimates to improve physician confidence in the machine learning prediction.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    1
    Citations
    NaN
    KQI
    []