Deep Learning Auto-Segmentation Model for HN005 Contour Quality Assurance

2021 
Purpose/Objective(s) Ensuring the quality of radiotherapy is essential for the successful conduct of clinical trials. The accurate delineation of organs at risks and targets compliance with protocol definitions is critical and historically time-consuming. We established a workflow utilizing deep learning-based AI auto-segmentation models for automated quality assurances (QA) for a head and neck multi-center clinical trial. Both in-house developed and commercially available AI models are tested. We report on the preliminary results. Materials/Methods Radiotherapy digital data of one-hundred-eighty-four patients enrolled in NRG-HN002 were used for this study. One-hundred-eighty-four CT and RT structures were reviewed and used for model training. Auto-segmentation models were trained for the mandible, brain stem, spinal cord, oral cavity, parotids, LarynxGSL, lips, esophagus submandibular glands, and pharynx with cascaded astrous convolution (CAC) and spatial pyramid pooling (SPP) module enhance convolutional neural networks (CNN). Twenty patients submitted to HN005 were selected for model testing to establish the QA workflow. AI (CNN) segmentation tools trained with separate sets of data from Mirada (DLCExpertTM) were also included in the testing process. Dice coefficient thresholds were established for detecting submitted segmentations with major variations with a 95% confidence interval. Results Both in-house and commercial auto-segmentations demonstrated high consensus; and, when compared with submitted segmentations, achieved a higher than 0.7 Dice similarity coefficient for the mandible, brain stem, spinal cord, parotid, and submandibular glands; higher than 0.5 dice for esophagus, oral cavity, larynx, and lips. Submitted segmentation with the dice coefficient lower than the established threshold often has missing slices, redundant slices, or erroneous overlapping with other structures. The detection specificity was found to be higher than 0.9. Due to the irregular shape of the pharynx and its overlapping with the target volume, the auto-segmentations have not achieved sufficient accuracy (with dice below 0.2). Conclusion Both commercial and in-house models demonstrate high specificity for submitted contour error detection for all the HN005 required OARs except for the pharynx. Comparison between AI and expert reviews will be included in future studies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []