Circulating Tumor Cell Assay Enables Prediction of Recurrence Following Stereotactic Body Radiotherapy for Early-Stage Non-Small Cell Lung Cancer: An Interpretable Machine Learning Study.

2021 
PURPOSE/OBJECTIVE(S) Despite the effectiveness of stereotactic body radiotherapy (SBRT) in the local control of early-stage non-small cell lung cancer (ES-NSCLC), recurrence rates remain high. Such results may be improved with selection of high-risk patients for intensified treatment, e.g., early salvage radiotherapy or adjuvant systemic therapy/immunotherapy. We previously reported that a high level of circulating tumor cell (CTC) assay counts is prognostic for increased recurrence risks. In this study, based on the same patient cohort, we aimed to determine a threshold of CTC counts using the explainable boosting machine (EBM), a new interpretable machine learning model. MATERIALS/METHODS Our dataset comprised de-identified patients (n = 92) with biopsy-proven or clinically presumed ES-NSCLC, enrolled between August 2013 and October 2018. All patients received SBRT to a median dose of 50 Gy. Peripheral blood samples for CTC assays were collected for each patient during pre-treatment, on-treatment (between the first and last fraction), and post-treatment at follow-up intervals of 1, 3, 6, 12, 18 and 24 months. EBM is a recently developed open-source machine learning model. It is an efficient implementation of the generalized additive model with pairwise interactions (GA2M), which yields high intelligibility without sacrificing accuracy. We used the explainable boosting classifier from EBM to predict recurrence risks, including any local, nodal or distant failures during the 2-year follow-up. In addition to the CTC counts, we selected the following clinicopathologic parameters known before the time of treatment as training features: gender, race, body mass index, smoker status, pack-years, age at enrollment, tumor size, stage, histology, and fractionation. The dataset was stratified by the recurrence label, and 5-fold split into 80% training set and 20% testing set. RESULTS EBM was trained on each feature in a round-robin fashion via sequential gradient boosting. The model scored the pre-treatment CTC count (PreCTC) and tumor size as the most important features, with 9.7% and 9.3% of overall score. AJCC Stage and T Stage were scored lower at 3.7% each. The average area under the curve (AUC) was 0.713 ± 0.099. From the model, we extracted the 2-year recurrence risks as a function of PreCTC, with a peak found at 6. CONCLUSION EBM correctly scored PreCTC as the most influential feature for predicting recurrence, in agreement with our previous findings; additionally, we extracted a PreCTC threshold of 6 for clinical stratification of ES-NSCLC patients. These results illustrate how EBM may be a useful innovative tool for machine-learning aided translation of clinical data sets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []