Discounted Reinforcement Learning is Not an Optimization Problem

2019 
Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks. This is because it is not an optimization problem --- it lacks an objective function. After substantiating these claims, we go on to address some misconceptions about discounting and its connection to the average reward formulation. We encourage researchers to adopt rigorous optimization approaches for reinforcement learning in continuing tasks, such as average reward.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    10
    Citations
    NaN
    KQI
    []