Separation of pulsar signals from noise using supervised machine learning algorithms

2018 
Abstract We evaluate the performance of four different machine learning (ML) algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP), Adaboost, Gradient Boosting Classifier (GBC), and XGBoost, for the separation of pulsars from radio frequency interference (RFI) and other sources of noise, using a dataset obtained from the post-processing of a pulsar search pipeline. This dataset was previously used for the cross-validation of the SPINN -based machine learning engine, obtained from the reprocessing of the HTRU-S survey data (Morello et al., 2014). We have used the Synthetic Minority Over-sampling Technique (SMOTE) to deal with high-class imbalance in the dataset. We report a variety of quality scores from all four of these algorithms on both the non-SMOTE and SMOTE datasets. For all the above ML methods, we report high accuracy and G-mean for both the non-SMOTE and SMOTE cases. We study the feature importances using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum Relevance approach to report algorithm-agnostic feature ranking. From these methods, we find that the signal to noise of the folded profile to be the best feature. We find that all the ML algorithms report FPRs about an order of magnitude lower than the corresponding FPRs obtained in Morello et al. (2014), for the same recall value.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    88
    References
    31
    Citations
    NaN
    KQI
    []