Off-Policy Q-Learning for Infinite Horizon LQR Problem with Unknown Dynamics

2018 
In this paper, a novel online Q-Iearning approach is proposed to solve the Infinite Horizon Linear Quadratic Regulator (IHLQR) problem for continuous-time (CT) linear time-invariant (LMI) systems. The proposed Q-Iearning algorithm employing off-policy reinforcement learning (RL) technology improves the exploration ability of Q-Iearning to the state space. During the learning process, the Q-Iearning algorithm can be implemented just using the data sets which just contains the information of the behavior policy and the corresponding system state, thus is data- driven. Moreover, the data sets can be used repeatedly, which is computationally efficient. A mild condition on probing noise is established to ensure the converge of the proposed Q-Iearning algorithm. Simulation results demonstrate the effectiveness of the developed algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    0
    Citations
    NaN
    KQI
    []