Feature construction as a bi-level optimization problem

2020 
Feature selection and construction are important preprocessing techniques in data mining. They allow not only dimensionality reduction but also classification accuracy and efficiency improvement. While feature selection consists in selecting a subset of relevant feature from the original feature set, feature construction corresponds to the generation of new high-level features, called constructed features, where each one of them is a combination of a subset of original features. Based on these definitions, feature construction could be seen as a bi-level optimization problem where the feature subset should be defined first and then the corresponding (near) optimal combination of the selected features should be found. Motivated by this observation, we propose, in this paper, a bi-level evolutionary approach for feature construction. The basic idea of our algorithm, named bi-level feature construction genetic algorithm (BFC-GA), is to evolve an upper-level population for the task of feature selection, while optimizing the feature combinations at the lower level by evolving a follower population. It is worth noting that for each upper-level individual (feature subset), a whole lower-level population is optimized to find the corresponding (near) optimal feature combination (constructed feature). In this way, BFC-GA would be able to output a set of optimized constructed features that could be very informative to the considered classifier. A detailed experimental study has been conducted on a set of commonly used datasets with varying dimensions. The statistical analysis of the obtained results shows the competitiveness and the outperformance of our bi-level feature construction approach with respect to many state-of-the-art algorithms.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    47
    References
    8
    Citations
    NaN
    KQI
    []