On Target Item Sampling in Offline Recommender System Evaluation

2020 
Target selection is a basic yet often implicit decision in the configuration of offline recommendation experiments. In this paper we research the impact of target sampling on the outcome of comparative recommender system evaluation. Specifically, we undertake a detailed analysis considering the informativeness and consistency of experiments across the target size axis. We find that comparative evaluation using reduced target sets contradicts in many cases the corresponding outcome using large targets, and we provide a principled explanation for these disagreements. We further seek to determine which among the contradicting results may be more reliable. Through comparison to unbiased evaluation, we find that minimum target sets incur in substantial distortion in pairwise system comparisons, while maximum sets may not be ideal either, and better options may lie in between the extremes. We further find means for informing the target size setting in the common case where unbiased evaluation is not possible, by an assessment of the discriminative power of evaluation, that remarkably aligns with the agreement with unbiased evaluation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    6
    Citations
    NaN
    KQI
    []