language-icon Old Web
English
Sign In

ExTra: Transfer-guided Exploration.

2019 
The sample efficiency and convergence time of a Reinforcement Learning (RL) algorithm depend heavily on the exploration method used by the agent. In thiswork, we formulate an exploration method that uses prior experiences of an agent at similar tasks in other environments for improving the efficiency of exploration in the current task-environment. We show that given an optimal policy in a related task-environment, its bisimulation distance from the current task-environment gives a lower bound on the optimal advantage of state-action pairs in the current task-environment.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    1
    Citations
    NaN
    KQI
    []