Applying a random projection algorithm to optimize machine learning model for breast lesion classification.

2021 
$Objective:$ Since computer-aided diagnosis (CAD) schemes of medical images usually computes large number of image features, which creates a challenge of how to identify a small and optimal feature vector to build robust machine learning models, the objective of this study is to investigate feasibility of applying a random projection algorithm (RPA) to build an optimal feature vector from the initially CAD-generated large feature pool and improve performance of machine learning model. $Methods:$ We assemble a retrospective dataset involving 1,487 cases of mammograms in which 644 cases have confirmed malignant mass lesions and 843 have benign lesions. A CAD scheme is first applied to segment mass regions and initially compute 181 features. Then, support vector machine (SVM) models embedded with several feature dimensionality reduction methods are built to predict likelihood of lesions being malignant. All SVM models are trained and tested using a leave-one-case-out cross-validation method. SVM generates a likelihood score of each segmented mass region depicting on one-view mammogram. By fusion of two scores of the same mass depicting on two-view mammograms, a case-based likelihood score is also evaluated. $Results:$ Comparing with the principle component analyses, nonnegative matrix factorization, and Chi-squared methods, SVM embedded with RPA yielded a significantly higher case-based lesion classification performance with the area under ROC curve of 0.84±0.01 (p<0.02). $Conclusion:$ The study demonstrates that RPA is a promising method to generate optimal feature vectors and improve SVM performance. $Significance:$ This study presents a new method to develop CAD schemes with significantly higher and robust performance.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []