Out-of-time cross-validation strategies for classification in the presence of dataset shift

2021 
Model selection is a highly important step in the process of extracting knowledge from datasets. This is usually done via partitioning strategies such as cross-validation in which the training and test subsets are selected randomly. However, it has been suggested in the literature that this is not the best approach in changing environments due to the risk of data obsolescence. This paper proposes novel out-of-time cross-validation mechanisms for model selection and evaluation designed for binary classification. Our approach extends the reasoning behind the rolling forecasting origin method for time-series analysis, providing an effective methodology for obtaining the prequential performance of a classifier on an out-of-time test sample. Our proposed method also includes a forgetting mechanism for identifying outdated samples that should be ignored in model training. Experiments on simulated and real-world datasets demonstrate the virtues of our approach in relation to various well-known validation strategies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    39
    References
    0
    Citations
    NaN
    KQI
    []