Evaluation of a convolutional neural network for ovarian tumor differentiation based on magnetic resonance imaging.
OBJECTIVES There currently lacks a noninvasive and accurate method to distinguish benign and malignant ovarian lesion prior to treatment. This study developed a deep learning algorithm that distinguishes benign from malignant ovarian lesion by applying a convolutional neural network on routine MR imaging. METHODS Five hundred forty-five lesions (379 benign and 166 malignant) from 451 patients from a single institution were divided into training, validation, and testing set in a 7:2:1 ratio. Model performance was compared with four junior and three senior radiologists on the test set. RESULTS Compared with junior radiologists averaged, the final ensemble model combining MR imaging and clinical variables had a higher test accuracy (0.87 vs 0.64, p < 0.001) and specificity (0.92 vs 0.64, p < 0.001) with comparable sensitivity (0.75 vs 0.63, p = 0.407). Against the senior radiologists averaged, the final ensemble model also had a higher test accuracy (0.87 vs 0.74, p = 0.033) and specificity (0.92 vs 0.70, p < 0.001) with comparable sensitivity (0.75 vs 0.83, p = 0.557). Assisted by the model's probabilities, the junior radiologists achieved a higher average test accuracy (0.77 vs 0.64, Δ = 0.13, p < 0.001) and specificity (0.81 vs 0.64, Δ = 0.17, p < 0.001) with unchanged sensitivity (0.69 vs 0.63, Δ = 0.06, p = 0.302). With the AI probabilities, the junior radiologists had higher specificity (0.81 vs 0.70, Δ = 0.11, p = 0.005) but similar accuracy (0.77 vs 0.74, Δ = 0.03, p = 0.409) and sensitivity (0.69 vs 0.83, Δ = -0.146, p = 0.097) when compared with the senior radiologists. CONCLUSIONS These results demonstrate that artificial intelligence based on deep learning can assist radiologists in assessing the nature of ovarian lesions and improve their performance. KEY POINTS • Artificial Intelligence based on deep learning can assess the nature of ovarian lesions on routine MRI with higher accuracy and specificity than radiologists. • Assisted by the deep learning model's probabilities, junior radiologists achieved better performance that matched those of senior radiologists.