Algorithmic Splitting: A Method for Dataset Preparation

2021 
The datasets that appear in publications are curated and have been split into training, testing and validation sub-datasets by domain experts. Consequently, machine learning models typically perform well on such split-by-hand prepared datasets. Whereas preparing real-world datasets into curated split training, testing and validation sub-dataset requires extensive effort. Usually, repetitive random splits are carried out and trained and evaluated on until reaching out a good score on the evaluation metrics. In this paper, an algorithmic method is proposed for preparing the sub-datasets splits for machine learning models. The objective of the proposed method is to achieve an evenly representative splits out of the dataset with standard and algorithmic way that reduce the perplexity of random splitting.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []