A Distributed Integrated Feature Selection Scheme for Column Subset Selection

2021 
Most of the existing distributed feature selection schemes neglect how good the subsets are that are mapped to the computational nodes, which causes a waste of time and hardware resources. A distributed integrated feature selection scheme (DIFS) with Subset Quality Evaluation (SQE) is proposed. SQE studies the relevance between the quality of a subset and the number of selected features from this subset, which helps shorten the feature selection time efficiently. We have given the implementation of our scheme for the Column Subset Selection (CSS) problem. We integrate a CSS algorithm in DIFS and information entropy as the SQE metric. We prove that the speedup of DIFS can reach m^3 compared to the centralized algorithm in ideal situations where m is the number of computational nodes, and give a well bounded approximation guarantee of the solution for CSS problem. Extensive experiments on eight data sets are used to verify the performance of scheme. Experiments results demonstrate the effectiveness of SQE and the impressive speedup DIFS can achieve. Although there is a slight increase of the reconstruction error value in some situations. Additional experiments of classification tasks reveal that the performance of DIFS is better than existing state-of-the-art distributed algorithms.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []