Learning from small datasets containing nominal attributes

2018 
Abstract In many small-data-learning problems, owing to the incomplete data structure, explicit information for decision makers is limited. Although machine learning algorithms are extensively applied to extract knowledge, most of them are developed without considering whether the training sets can fully represent the population properties. Focusing on small data which contains nominal inputs and continuous outputs, this paper develops an effective sample generating procedure based on fuzzy theories to tackle the learning issue by data preprocessing. According to the derived fuzzy relations between categories and continuous outputs, the possibilities of the combinations of categories (virtual samples) can be aggregated when continuous outputs are given. Proper virtual samples are further selected by using fuzzy alpha-cut on the possibility distributions, and these are added to the training sets to form new ones. In the experiment, sixteen datasets taken from the UC Irvine Machine Learning Repository are examined with back-propagation neural networks and support vector regressions. The results reveal that the forecasting accuracies of the two models are significantly improved when they are built with the proposed new training sets. Moreover, the results also indicate the proposed method outperforms bootstrap aggregating and the synthetic minority over-sampling technique-Nominal-Continuous with the greatest amount of statistical support.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    5
    Citations
    NaN
    KQI
    []