A Comparative Study of Feature Selection Methods on Genomic Datasets

2019 
Feature selection plays an important role in reducing the size of datasets by choosing the most informative features and discarding the rest. The use of feature selection in microarray datasets for detecting cancer is widely investigated. In this paper we provide a series of comparisons between perturbation-based feature selection (PFS) and traditional methods, such as principal component analysis (PCA), correlation based feature selection (CFS), and least-angle regression (LARS), and more recent methods, such as Hilbert-Schmidt independence criterion Lasso (HSIC-Lasso), minimum redundancy maximum relevance (mRMR), and a feature selection using support vector machines (FS-SVM). The performance of each method is demonstrated by conducting a series of comparisons on genomic cancer datasets, as well as, inflammatory bowel disease datasets. The experiments show that PFS and HSIC-Lasso are both scalable to large datasets.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    0
    Citations
    NaN
    KQI
    []