Identification of gene signatures for classifying of breast cancer subtypes using protein interaction database and support vector machines

2015 
Many studies have used the microarray gene expression data in order to classifying breast cancer subtypes. However, the classification accuracy was not acceptable in many cases even by applying the algorithms to only a single set of data. In this regard, using appropriate algorithm in every step of whole procedure, applying useful bioinformatics databases, considering the interaction among genes, and properly combining analytical steps are the main challenging problems. In this study a solution was proposed which followed a three step process. In the first step a filter feature selection method was used to produce a small set of informative genes. In the second step, the primary selected genes were mapped on the protein-protein interaction network to extend the gene set according to the linking among corresponding proteins. Thus, a portion of genes that was pruned in the first stage is added again to the primary set of selected genes. In the final stage, by using support vector machine-based recursive feature elimination (SVMRFE) method, the final set of informative genes was identified. After that, we compared our proposed algorithm with decision tree methods in the same datasets. The proposed procedure was evaluated on two publicly available DNA microarray dataset, including 456 samples on breast cancer. The proposed algorithm reached to 100% accuracy for predicting Luminal B by using the JMI method in the first step. In conclusion the proposed method showed an appealing improvement in classification accuracy for a multiclass prediction problem. We can predict subtypes with greater than 91.2% overall accuracy by proposed algorithm. However, the accuracy of prediction subtypes in tree decision method is 78.6%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []