An optimal machine learning model for breast lesion classification based on random projection algorithm for feature optimization

2021 
The purpose of this study is to develop a machine learning model with the optimal features computed from mammograms to classify suspicious regions as benign and malignant. To this aim, we investigate the benefits of implementing a machine learning approach embedded with a random projection algorithm to generate an optimal feature vector and improve classification performance. A retrospective dataset involving 1,487 cases is used. Among them, 644 cases depict malignant lesions, while the rest 843 cases are benign. The locations of all suspicious regions have been annotated by radiologists before. A computer-aided detection scheme is applied to pre-process the images and compute an initial set of 181 features. Then, three support vector machine (SVM) models are built using the initial feature set and embedded with two feature regeneration methods, namely, principal component analysis and random projection algorithm, to reduce dimensionality of feature space and generate smaller optimal feature vectors. All SVM models are trained and tested using the leave-one-case-out cross-validation method to classify between malignant and benign cases. The data analysis results show that three SVM models yield the areas under ROC curves of AUC = 0.72±0.02, 0.79±0.01 and 0.84±0.018, respectively. Thus, this study demonstrates that applying a random projection algorithm enables to generate optimal feature vectors and significantly improve machine learning model (i.e., SVM) performance (p<0.02) to classify mammographic lesions. The similar approach can also been applied to help more effectively train and improve performance of machine learning models applying to other types of medical image applications.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []