Analyzing the Role of Temporal Differencing in Deep Reinforcement Learning

2018 
Wide adoption of deep networks as function approximators in modern reinforcement learning (RL) is changing the research environment, both with regard to best practices and application domains. Yet, our understanding of RL methods has been shaped by theoretical and empirical results with tabular representations and linear function approximators. These results suggest that RL methods using temporal differencing (TD) are superior to direct Monte Carlo (MC) estimation. In this paper, we re-examine the role of TD in modern deep RL, using specially designed environments that each control for a specific factor that affects performance, such as reward sparsity, reward delay or the perceptual complexity of the task. When comparing TD with infinite horizon MC, we are able to reproduce the results from the past in modern settings characterized by perceptual complexity and deep nonlinear models. However, we also find that finite horizon MC methods are not inferior to TD, even in sparse or delayed reward tasks, making MC a viable alternative to TD. We discuss the role of perceptual complexity in reconciling these findings with classic empirical results.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    2
    Citations
    NaN
    KQI
    []