Learning Navigation Policies for Mobile Robots in Deep Reinforcement Learning with Random Network Distillation

2021 
Learning navigation policies considers the task of training a model that can find collision-free paths for mobile robots, where various Deep Reinforcement Learning (DRL) methods have been applied with promising results. However, the natural reward function for the task is usually sparse, i.e., obtaining a penalty for the collision and a positive reward for arriving the target position, which makes it difficult to learn. In particular, for some complex navigation environments, it is hard to search a collision-free path by the random exploration, which leads to a rather slow learning speed and solutions with poor performance. In this paper, we propose a DRL based approach to train an end-to-end navigation planner, i.e, the policy neural network, that directly translates the local grid map and the relative goal of the robot into its moving actions. To handle the sparse reward problem, we augment the normal extrinsic reward from the environment with intrinsic reward signals measured by random network distillation (RND). In specific, the intrinsic reward is calculated by two different networks from RND, which encourages the agent to explore a state that has not been seen before. The experimental results show that by augmenting the reward function with intrinsic reward signals by RND, solutions with better performance can be learned more efficiently and more stably in our approach. We also deploy the trained model to a real robot, which can perform collision avoidance in navigation tasks without any parameter tuning. A video of our experiments can be found at https://youtu.be/b1GJrWfO8pw.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    0
    Citations
    NaN
    KQI
    []