Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

Tal Lancewicki,Shahar Segal,Tomer Koren,Yishay Mansour

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

2021

Tal Lancewicki
Shahar Segal
Tomer Koren
Yishay Mansour

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards, and the reward-independent delay setting. Our main contribution is algorithms that achieve near-optimal regret in each of the settings, with an additional additive dependence on the quantiles of the delay distribution. Our results do not make any assumptions on the delay distributions: in particular, we do not assume they come from any parametric family of distributions and allow for unbounded support and expectation; we further allow for infinite delays where the algorithm might occasionally not observe any feedback.

Keywords:

Quantile
Computer science
Mathematical optimization
Distribution (mathematics)
Regret
Parametric family

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations