Active Reinforcement Learning: Observing Rewards at a Cost

David Krueger,Jan Leike,Owain Evans,John Salvatier

Active Reinforcement Learning: Observing Rewards at a Cost

2020

David Krueger
Jan Leike
Owain Evans
John Salvatier

Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0. The central question of ARL is how to quantify the long-term value of reward information. Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics. We propose and evaluate several heuristic approaches for ARL in multi-armed bandits and (tabular) Markov decision processes, and discuss and illustrate some challenging aspects of the ARL problem.

Keywords:

Reinforcement learning
Heuristics
Computer science
Artificial intelligence
Markov decision process
Heuristic

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations