Joint Policy-Value Learning for Recommendation

Olivier Jeunen,David Rohde,Flavian Vasile,Martin Bompaire

Joint Policy-Value Learning for Recommendation

2020

Olivier Jeunen
David Rohde
Flavian Vasile
Martin Bompaire

Conventional approaches to recommendation often do not explicitly take into account information on previously shown recommendations and their recorded responses. One reason is that, since we do not know the outcome of actions the system did not take, learning directly from such logs is not a straightforward task. Several methods for off-policy or counterfactual learning have been proposed in recent years, but their efficacy for the recommendation task remains understudied. Due to the limitations of offline datasets and the lack of access of most academic researchers to online experiments, this is a non-trivial task. Simulation environments can provide a reproducible solution to this problem.

Keywords:

Data science
Counterfactual thinking
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations