A novel feature reduction method to improve performance of machine learning model

2021 
Developing radiomic based machine learning models has drawn considerable attention in recent years. However, identifying a small and optimal feature vector to build a robust machine learning models has always been a controversial issue. In this study, we investigated the feasibility of applying a random projection algorithm to create an optimal feature vector from the CAD-generated large feature pool and improve the performance of the machine learning model. We assemble a retrospective dataset involving abdominal computed tomography (CT) images acquired from 188 patients diagnosed with gastric cancer. Among them, 141 cases have peritoneal metastasis (PM), while 47 cases do not have PM. A computer-aided detection (CAD) scheme is applied to segment the gastric tumor area and computes 325 image features. Then, two Logistic Regression models embedded with two different feature dimensionality reduction methods, namely, the principal component analysis (PCA) and a random projection algorithm (RPA). Afterward, a synthetic minority oversampling technique (SMOTE) is used to balance the dataset. The proposed ML model is built to predict the risk of the patients having advanced gastric cancer (AGC). All Logistic Regression models are trained and tested using a leave-one-case-out cross-validation method. Results show that the logistic regression embedded with RPA yielded a significantly higher AUC value (0.69±0.025) than using PCA (0.62±0.014) (p<0.05). The study demonstrated that CT images of the gastric tumors contain discriminatory information to predict the risk of PM in AGC patients, and RPA is a promising method to generate optimal feature vector, improving the performance of ML models of medical images.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []