A novel approach to identify subtype-specific network biomarkers of breast cancer survivability

2020 
Increasing the survival rates for breast cancer has gained significant researcher interest. However, current studies reveal that a small subset of gene makers can predict survivability for people with different breast cancer subtypes. In these studies, the selected genes are not necessarily functionally related, and hence, they may not correctly indicate the molecular mechanism behind breast cancer survivability. Also, several studies have shown there is a very low overlap between the biomarkers subsets for the same cancer disease. To improve the robustness of the classification performance and stability of detected biomarkers, recent methods involve taking existing knowledge on relations between genes into account in the classifier by aggregating functionality-related genes to produce discriminative gene subnetworks called network biomarkers. In this paper, using a dataset of patients with different subtypes of breast cancer, we devised a novel network-based approach by integrating a protein–protein interaction (PPI) network with gene expression data to (1) identify the network biomarkers (metagene) of breast cancer survivability and (2) predict the survivability of breast cancer patients based on their subtypes of breast cancer. Our method involves using the concept of seed genes for the identification of network biomarkers, ADASYN to solve class-imbalance, and random forest to predict the survivability of patients. We obtained the best classification performance with distance three from seed gene protein where the Gmean, F1-measure, and accuracy were respectively 0.900, 0.800, and 90.34%. The maximum size of a network biomarker with distance 3 is 9. A maximum of 34 genes is needed to accurately predict the survivability of breast cancer patients. This method can be used to identify the survivability of breast cancer patients using gene relationship networks. It has high prediction performance, including specificity and sensitivity for both cohorts of survivals and deceased.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    12
    Citations
    NaN
    KQI
    []