One-armed bandit models with continuous and delayed responses

2003 
One-armed bandit processes with continuous delayed responses are formulated as controlled stochastic processes following the Bayesian approach. It is shown that under some regularity conditions, a Gittins-like index exists which is the limit of a monotonic sequence of break-even values characterizing optimal initial selections of arms for finite horizon bandit processes. Furthermore, there is an optimal stopping solution when all observations on the unknown arm are complete. Results are illustrated with a bandit model having exponentially distributed responses, in which case the controlled stochastic process becomes a Markov decision process, the Gittins-like index is the Gittins index and the Gittins index strategy is optimal. Copyright Springer-Verlag 2003
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    7
    Citations
    NaN
    KQI
    []