One-armed bandit models with continuous and delayed responses

Xikui Wang,Mikelis G. Bickis

One-armed bandit models with continuous and delayed responses

2003

Xikui Wang
Mikelis G. Bickis

One-armed bandit processes with continuous delayed responses are formulated as controlled stochastic processes following the Bayesian approach. It is shown that under some regularity conditions, a Gittins-like index exists which is the limit of a monotonic sequence of break-even values characterizing optimal initial selections of arms for finite horizon bandit processes. Furthermore, there is an optimal stopping solution when all observations on the unknown arm are complete. Results are illustrated with a bandit model having exponentially distributed responses, in which case the controlled stochastic process becomes a Markov decision process, the Gittins-like index is the Gittins index and the Gittins index strategy is optimal. Copyright Springer-Verlag 2003

Keywords:

Optimal stopping
Monotonic function
Mathematical economics
Markov decision process
Gittins index
Initial value problem
Mathematical optimization
Markov process
Stochastic process
Indexation
Mathematics
Exponential distribution

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations