Collect Insights of an H&N IMRT Planning AI Agent Through Analyzing Relationships Between Fluence Map Prediction Error and the Corresponding Dosimetric Impacts.

2021 
Purpose/objective(s) With many deep learning (DL) models being developed for clinical applications, it is important to understand their behavior and clinical consequence. This study aims to collect insights of the relationship between fluence map prediction error and its dosimetric impacts in a DL-based AI agent for H&N IMRT planning. Materials/methods An AI agent has been implemented to generate IMRT plans via fluence map prediction, bypassing inverse optimization. While the prostate IMRT plans generated by the agent were comparable to clinical plans in quality, the application into H&N patients exhibited large variations in the plan quality due to higher anatomy complexity. As the DL model's output is fluence maps of an IMRT plan, standard error analyses were focused on the differences between the predicted and ground truth fluence maps, i.e., prediction error. However, the ultimate plan evaluation is based on clinical criteria such as DVHs and dose distributions. Therefore, the AI agent's performance in clinics is subjected to complex and non-intuitive relationships between fluence map prediction error and corresponding dose distribution changes, and warrants thorough investigation. In this study, a series of tests were designed to collect insights of the impact of DL model performance on plan's dosimetric quality. The fluence map prediction error was analyzed for its dosimetric effects using five error decomposition modes:1) ground truth fluence intensity bands in 5 threshold levels, 2) predicted fluence intensity bands in 5 threshold levels, 3) ground truth fluence gradient bands (high and low), 4) Fourier space bands (frequency bands) in 8 threshold levels, and 5) Fourier space circles (below certain frequency) in 8 threshold levels. The DL model was trained with 216 cases and tested with 15 additional cases. PTV and OAR dosimetric metrics were analyzed by Spearman's rank tests (P = 0.05). Results Most PTV-related metrics were significantly correlated with the error components. Among the different decomposition modes, the Fourier space circle radii have large Spearman's coefficients with PTV metrics, suggesting that they were best able to extract error components that reveal plan quality impacts. The low-frequency error within a Fourier space circle of radius = 32 pixels (20% of Fourier space) had the most significant impact on overall plan quality and PTV heterogeneity. Conclusion The fluence map prediction error analysis is critical to evaluate the AI agent performance. Such insight will help with fine-tuning the DL models in architecture design and loss function selection.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []