Deep Learning and Harmonization of Multi-Institutional Data for Automated Gross Tumor and Nodal Segmentation for Oropharyngeal Cancer.

2021 
Purpose/objective(s) Automated tumor segmentation for oropharyngeal cancer (OPC) has the potential to improve treatment planning, response assessment, and clinical translation of imaging-based biomarkers. Deep learning has shown promise for cancer imaging segmentation, but performance for OPC tumors has been suboptimal with studies generally limited to small single-institution settings. In this study, we curated and harmonized multiple heterogeneous, multi-institutional datasets to develop and validate computed tomography (CT)-based, deep learning models for total gross tumor volume (GTV), primary (GTVp), and nodal (GTVn) segmentations. Materials/methods Data was obtained from The Cancer Imaging Archive (TCIA) and included 1228 CT simulation scans from OPC patients treated with definitive radiotherapy collected from 2003-2014 from four institutions that included original gross tumor volumes (GTV) as delineated by the treating radiation oncologist. GTVs were manually reviewed and ground-truth labels were harmonized to include distinct GTVp and GTVn labels. Cases were split randomly, such that 70% of cases were used for model training, 15% for tuning, and 15% for independent performance testing. Utilizing a modified 3D UNET-based architecture, models were trained and tuned to predict total GTV, GTVp and GTVn. Model performance was assessed by measuring precision, recall, and dice score coefficient (DSC) when comparing deep learning-generated to ground truth volumes in the independent test set. Patients without contrast-enhanced CT were excluded from the GTVp model a priori, given the importance of contrast in delineating primary tumor. Results Algorithms' median performance on test sets with 95% confidence intervals. GTVp test set limited to contrast-enhanced scans. Within the total dataset (n = 1228), median age was 59, cases were 82% male, and HPV status was 49% positive, 16% negative, and 35% unknown. Tumor staging was 45% T3-4, 54% T1-T2, and 1% T0/X. Nodal staging was 75% N2-N3 and 25% N0-1. Overall, the GTVn model had the highest performance (median DSC: 0.76; 95% CI: 0.73, 0.77), followed by total GTV (median DSC: 0.71; 95% CI: 0.68, 0.73), and GTVp (median DSC 0.68; 95% CI: 0.63, 0.71). Conclusion Deep learning with multiple harmonized data sources can yield effective models for OPC primary and nodal segmentation using CT alone. The utility of these models will depend on the clinical use case and will be explored on further investigation, though current model performance metrics, particularly for nodal segmentation, are likely adequate for prospective testing in clinical and research applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []