A Comparative Study of Feature Selection Methods on Genomic Datasets

Javad Rahimipour Anaraki,Hamid Usefi

A Comparative Study of Feature Selection Methods on Genomic Datasets

2019

Feature selection plays an important role in reducing the size of datasets by choosing the most informative features and discarding the rest. The use of feature selection in microarray datasets for detecting cancer is widely investigated. In this paper we provide a series of comparisons between perturbation-based feature selection (PFS) and traditional methods, such as principal component analysis (PCA), correlation based feature selection (CFS), and least-angle regression (LARS), and more recent methods, such as Hilbert-Schmidt independence criterion Lasso (HSIC-Lasso), minimum redundancy maximum relevance (mRMR), and a feature selection using support vector machines (FS-SVM). The performance of each method is demonstrated by conducting a series of comparisons on genomic cancer datasets, as well as, inflammatory bowel disease datasets. The experiments show that PFS and HSIC-Lasso are both scalable to large datasets.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations