Ultrasonographic risk stratification of indeterminate thyroid nodules; a comparison of an artificial intelligence algorithm with radiologist performance

Aylin Tahmasebi,Shuo Wang,Kelly Daniels,Elizabeth Cottrill,Ji-Bin Liu,Jiajun Xu,Andrej Lyshchik,John R. Eisenbrey

Ultrasonographic risk stratification of indeterminate thyroid nodules; a comparison of an artificial intelligence algorithm with radiologist performance

2020

Background, Motivation and Objective: Thyroid nodules with indeterminate or suspicious cytology are commonly encountered in clinical practice and their clinical management is controversial. Recently, genetical analysis of thyroid fine needle aspiration (FNAs) was implemented at some institutions to differentiate thyroid nodules as high and low risk based on the presence of certain oncogenes commonly associated with aggressive tumor behavior and poor patient outcomes. Our group recently detailed the performance of a machine-learning model based on ultrasonography images of thyroid nodules for the prediction of high and low risk mutations. This study evaluated the performance of a second-generation machine-learning algorithm incorporating both object detection analysis and image classification and subsequently compared performance against blinded radiologists. Statement of Contribution/Methods: This retrospective study was conducted at Thomas Jefferson University and included an evaluation of 262 thyroid nodules that underwent ultrasound imaging, ultrasound-guided FNA and next-generation sequencing (NGS) or surgical pathology after resection. An object detection and image classification model were employed to first identify the location of nodules and then to assess the malignancy. A Google cloud platform (AutoML Vision; Google LLC) was used for this purpose. Either NGS or surgical pathology was considered as reference standard upon availability. 211 nodules were used for model development and the unused 51 nodules for model testing. Diagnostic performance in 47 nodules for which pathology or NGS were available was compared to blinded reads by 3 radiologists and performance expressed as mean $\pm$ standard deviation %. Results/Discussion: The algorithm achieved positive predictive value (PPV) of 68.31% and sensitivity of 86.81% within the training model. The model was tested on images of 51 unused nodules and all 51 nodules were correctly located (100%). For risk stratification, the model demonstrated a sensitivity of 73.9%, specificity of 70.8%, positive predictive value (PPV) of 70.8%, negative predictive value (NPV) of 73.9% and overall accuracy of 66.7% in the 47 nodules. For comparison, the 3 radiologist performance in this same dataset demonstrated a sensitivity of, specificity of, PPV of, NPV of, and overall accuracy of This work demonstrates that a machine-learning algorithm using image classification performed similarly, if not slightly better than 3 experienced radiologists. Future research will focus on incorporating machine learning findings within radiologist interpretation to potentially improve diagnostic accuracy.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations