Meta-Thompson Sampling

Branislav Kveton,Mikhail Konobeev,Manzil Zaheer,Chih-Wei Hsu,Martin Mladenov,Craig Boutilier,Csaba Szepesvári

Meta-Thompson Sampling

2021

Branislav Kveton
Mikhail Konobeev
Manzil Zaheer
Chih-Wei Hsu
Martin Mladenov
Craig Boutilier
Csaba Szepesvári

Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns the prior and thus we call it MetaTS. We propose several efficient implementations of MetaTS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning and is of a broader interest, because we derive a novel prior-dependent Bayes regret bound for Thompson sampling. Our theory is complemented by empirical evaluation, which shows that MetaTS quickly adapts to the unknown prior.

Keywords:

Bayes' theorem
Gaussian
Thompson sampling
Computer science
Artificial intelligence
online learning
Regret

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations