Derivation and validation of a machine learning risk score using biomarker and electronic patient data to predict rapid progression of diabetic kidney disease

2020 
Importance: Diabetic kidney disease (DKD) is the leading cause of kidney failure in the United States and predicting progression is necessary for improving outcomes. Objective: To develop and validate a machine-learned, prognostic risk score (KidneyIntelXTM) combining data from electronic health records (EHR) and circulating biomarkers to predict DKD progression. Design: Observational cohort study Setting: Two EHR linked biobanks: Mount Sinai BioMe Biobank and the Penn Medicine Biobank. Participants: Patients with prevalent DKD (G3a-G3b with all grades of albuminuria (A1-A3) and G1 & G2 with A2-A3 level albuminuria) and banked plasma. Main outcomes and measures: Plasma biomarkers soluble tumor necrosis factor 1/2 (sTNFR1, sTNFR2) and kidney injury molecule-1 (KIM-1) were measured at baseline. Patients were divided into derivation [60%] and validation sets [40%]. A composite primary end point of rapid kidney function decline (RKFD) (estimated glomerular filtration rate (eGFR) decline of [≥]5 ml/min/1.73m2/year), [≥]40% sustained decline, or kidney failure within 5-years. A machine learning model (random forest) was trained and performance assessed using standard metrics. Results: In 1146 patients with DKD the median age was 63, 51% were female, median baseline eGFR was 54 ml/min/1.73 m2, urine albumin to creatinine ratio (uACR) was 61 mg/g, and follow-up was 4.3 years. 241 patients (21%) experienced RKFD. On 10-fold cross validation in the derivation set (n=686), the risk model had an area under the curve (AUC) of 0.77 (95% CI 0.74-0.79). In validation (n=460), the AUC was 0.77 (95% CI 0.76-0.79). By comparison, the AUC for an optimized clinical model was 0.62 (95% CI 0.61-0.63) in derivation and 0.61 (95% CI 0.60-0.63) in validation. Using cutoffs from derivation, KidneyIntelX stratified 47%, 37% and 16% of validation cohort into low-, intermediate- and high-risk groups, with a positive predictive value (PPV) of 62% (vs. 41% for KDIGO) in the high-risk group and a negative predictive value (NPV) of 91% in the low-risk group. The net reclassification index for events into high-risk group was 41% (p<0.05). Conclusions and Relevance: A machine learned model combining plasma biomarkers and EHR data improved prediction of adverse kidney events within 5 years over KDIGO and standard clinical models in patients with early DKD.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    44
    References
    0
    Citations
    NaN
    KQI
    []