Creating Robust Predictive Radiomic Models for Data From Independent Institutions Using Normalization

2019 
Purpose: The distribution of a radiomic feature can differ between two institutions due to, for example, different image acquisition parameters, imaging systems, and contouring (i.e., tumor delineation) variations between clinicians. We aimed to develop effective statistical methods to successfully apply a radiomics-based predictive model to an external dataset. Theory: Two common feature normalization methods, rescaling and standardization, were evaluated for suitability in reducing feature variability between institutions. Standardization was chosen as the preferred approach, since rescaling was more sensitive to statistical outliers, and potentially reduced the discrimination power of a feature. It was also demonstrated why a dataset needs to be balanced between positive and negative outcomes before standardization is applied to it. Methods: In this paper, the novelty and power of the developed method for improved application of radiomics models on external datasets is tied to finding the normalization transformations separately for each independent set. The clinical effectiveness of the normalization method was shown using magnetic resonance images of primary uterine adenocarcinoma. Feature selection was done using 94 samples (Institution X), and feature testing was done using 63 samples (Institution Y). The outcomes studied were lymphovascular space invasion and cancer staging. Logistic regression was used to obtain the prediction accuracy of a feature. Promising radiomic features were defined as those with AUC > 0.75 in the training set. Results: When comparing the prediction accuracy, ${F}$ -score, and Matthews correlation coefficient (MCC) of promising radiomic features in the testing set with and without standardization, there was an improvement due to standardization. For cancer stage prediction, average accuracy for all promising features rose from 0.64 to 0.72, average ${F}$ -score from 0.48 to 0.71, and average MCC from 0.34 to 0.44 ( ${p}\,\, { ). Furthermore, when applying standardization, the ratio of sensitivity to specificity was close to unity in the testing set, comparable to the ratio in the training set. Without standardization, this ratio deviated significantly from unity in the testing set. Conclusions: Applying feature standardization separately for each independent set using imbalance adjustments was shown to improve the predictive ability of radiomic models when applied to a dataset from an external institution.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    18
    Citations
    NaN
    KQI
    []