Reward system design for incorporating control performance

2015 
Reinforcement learning (RL) is a machine learning technique whereby the controller learns the control law by optimizing the received cumulative amount of reward. A reward is an instantaneous evaluation of the applied action at the current state, given by reward function. However in theory the reward function is assumed to be given, in practice it is an effort consuming work to design a good reward function. Reward is the only information about the learning task given to the controller and therefore optimizing the cumulative amount of reward corresponds to fulfilling a particular control performance. Designing the reward function to achieve the desired control specification is thus a crucial task to use RL as a controller synthesizing algorithm. The goal of this thesis is to synthesize a method to design proper reward function to achieve the desired control performance. This thesis focuses on two types of control performance. In the first part, reward function is designed to learn control law fulfilling classical control performance. Hereby an automaton is created to evaluate classical control criteria by mode-dependent reward functions. By modeling the process with an automaton, the control problem is divided into smaller subproblems such that the reward functions are kept simple. In the second part, temporal logic specification is converted into a reward system whereby a petri net is used to model the process suitable for rewarding. To the given temporal formula, reward function is assigned which is a function from the state, i.e. the marking, of the petri net. By putting information about the task into the petri net, reward function becomes simple and structured. Simulation experiments are done for several temporal logic specification.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []