Machine learning from clinical datasets of a contemporary decision for orthodontic tooth extraction.

2021 
Objective To examine the robustness of the published machine learning models in the prediction of extraction vs non-extraction for a diverse US sample population seen by multiple providers. Setting and sample population Diverse group of 838 patients (208 extraction, 630 non-extraction) were consecutively enrolled. Materials and methods Two sets of input features (117 and 22) including clinical and cephalometric variables were identified based on previous studies. Random forest (RF) and multilayer perception (MLP) models were trained using these feature sets on the sample population and evaluated using measures including accuracy (ACC) and balanced accuracy (BA). A technique to identify incongruent data was used to explore underlying characteristics of the data set and split all samples into 2 groups (G1 and G2) for further model training. Results Performance of the models (75%-79% ACC and 72%-76% BA) on the total sample population was lower than in previous research. Models were retrained and evaluated using G1 and G2 separately, and individual group MLP models yielded improved accuracy for G1 (96% ACC and 94% BA) and G2 (88% ACC and 85% BA). RF feature ranking showed differences between top features for G1 (maxillary crowding, mandibular crowding and L1-NB) and G2 (age, mandibular crowding and lower lip to E-plane). Conclusions An incongruent data pattern exists in a consecutively enrolled patient population. Future work with incongruent data segregation and advanced artificial intelligence algorithms is needed to improve the generalization ability to make it ready to support clinical decision-making.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    1
    Citations
    NaN
    KQI
    []