Learning to Build High-Fidelity and Robust Environment Models

2021 
This paper is concerned with robust learning to simulate (RL2S), a new problem of reinforcement learning (RL) that focuses on learning a high-fidelity environment model (i.e., simulator) for serving diverse downstream tasks. Different from the environment learning in model-based RL, where the learned dynamics model is only appropriate to provide simulated data for the specific policy, the goal of RL2S is to build a simulator that is of high fidelity when interacting with various policies. Thus the robustness (i.e., the ability to provide accurate simulations to various policies) of the simulator over diverse corner cases (policies) is the key challenge to address. Via formulating the policy-environment as a dual Markov decision process, we transform RL2S as a novel robust imitation learning problem and propose efficient algorithms to solve it. Experiments on continuous control scenarios demonstrate that the RL2S enabled methods outperform the others on learning high-fidelity simulators for evaluating, ranking and training various policies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    0
    Citations
    NaN
    KQI
    []