Thompson Sampling for Unimodal Bandits.

Long Yang,Zhao Li,Zehong Hu,Shasha Ruan,Shijian Li,Gang Pan,Hongyang Chen

Thompson Sampling for Unimodal Bandits.

2021

Long Yang
Zhao Li
Zehong Hu
Shasha Ruan
Shijian Li
Gang Pan
Hongyang Chen

In this paper, we propose a Thompson Sampling algorithm for \emph{unimodal} bandits, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from the entire decision space, our algorithm makes decision according to posterior distribution only in the neighborhood of the arm that has the highest empirical mean estimate. We theoretically prove that, for Bernoulli rewards, the regret of our algorithm reaches the lower bound of unimodal bandits, thus it is asymptotically optimal. For Gaussian rewards, the regret of our algorithm is $\mathcal{O}(\log T)$, which is far better than standard Thompson Sampling algorithms. Extensive experiments demonstrate the effectiveness of the proposed algorithm on both synthetic data sets and the real-world applications.

Keywords:

Computer science
Upper and lower bounds
Structure (category theory)
Posterior probability
Combinatorics
Thompson sampling
Bernoulli's principle
Gaussian
Regret
Asymptotically optimal algorithm

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations