Comparison of eight filter-based feature selection methods for monthly streamflow forecasting – three case studies on CAMELS data sets

2020 
Abstract Recently, there has been an increased emphasis on employing data-driven models to forecast streamflow. However, in these data-driven models used for forecasting monthly streamflow, the performances of filter-based feature selection (FFS) methods have not been studied in detail. In this study, we investigated the effectiveness of eight common FFS methods, namely, linear Pearson correlation, partial linear Pearson correlation (PCI), mutual information (MI), conditional MI, partial MI, maximal relevance minimal redundancy Pearson correlation, maximal relevance minimal redundancy MI and gamma test methods, on three regression models, namely multiple linear regression (MLR), ensemble extreme learning machine (enELM) and k-nearest neighbor (KNN) regression, for real-world one-month-ahead streamflow forecasting. The study was conducted on three cases from the Catchment Attributes and Meteorology for Large-sample Studies (CAMELS) data sets. Furthermore, two termination criterion (TC) methods, the Hampel test and resampling, were comparatively analyzed. The results of this study highlight three important findings. First, there was no dominant FFS method that coupled with enELM or KNN. Second, when resampling was applied to select a final model in the candidate combinations of the eight FFS methods and three regression models, PCI was the most favorable FFS method for the final model. Finally, the Hampel test TC was superior to the resampling TC in terms of stability and anti-overfitting. These findings have significant practical reference value for real-world monthly streamflow forecasting.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    72
    References
    13
    Citations
    NaN
    KQI
    []