The Datasets Dilemma: How Much Do We Really Know About Recommendation Datasets?

2022 
There has been sustained interest from both academia and industry throughout the years due to the importance and practicability of recommendation systems. However, several recent papers have pointed out critical issues with the evaluation process in recommender systems. Likewise, this paper takes an in-depth look at a fundamental but often neglected aspect of the evaluation procedure, i.e. the datasets themselves. To do so, we adopt a systematic and comprehensive approach to understand the datasets used for implicit feedback based top-K recommendation. We start by examining recent papers from top-tier conferences to find out how different datasets have been utilised thus far. Next, we look at the characteristics of these datasets to understand their similarities and differences. Finally, we conduct an empirical study to determine whether the choice of datasets used for evaluation can influence the observations and/or conclusions obtained. Our findings suggest that greater attention needs to be paid to the selection process of datasets used for evaluating recommender systems in order to improve the robustness of the obtained results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    53
    References
    0
    Citations
    NaN
    KQI
    []