Policy Learning with Human Reinforcement

2016 
A reinforcement learning agent learns an optimal policy under a certain environment with an evaluative reward function. In some applications, such an agent may lack sufficient adaptability to handle a variety of scenarios, similar to the one in the learning process. In other words, a learning agent is supposed to satisfy minor demands which are not part of the reward function. This paper proposes an interactive approach to accommodating human reinforcement and environmental rewards in a shaped reinforcement function, which can coach a robot despite goal modification or an inaccurate reward. The proposed approach coaches a robot, already equipped with a reinforcement learning mechanism, by human reinforcement feedback to conquer insufficiencies or shortsightedness in the environmental reward function. The proposed reinforcement learning algorithm links direct policy evaluation and human reinforcement to autonomous robots, accordingly shaping the reward function by combining both reinforcement signals. The technique of relative information entropy was applied to provide more effective learning for solving the conflict between human reinforcement and the robot’s core policy. In this work, the human coaching is conveyed by a bystander’s facial expressions. The transformation of facial expressions to a scalar index was processed by a type-2 fuzzy system. The simulated and experimental results show that a short-sighted robot could walk successfully through a swamp, and an under-powered car could reach the tip of a mountain with coaching from a bystander. The learning system worked quickly enough that the robot could continually adapt to an altered goal or environment.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    22
    References
    3
    Citations
    NaN
    KQI
    []