A composite learning method for multi-ship collision avoidance based on reinforcement learning and inverse control

2020 
Abstract Model-free reinforcement learning methods have potentials in ship collision avoidance under unknown environment. To defect the low efficiency problem of the model-free reinforcement learning, a composite learning method is proposed based on an asynchronous advantage actor-critic (A3C) algorithm, a long short-term memory neural network (LSTM) and Q-learning. The proposed method uses Q-learning for adaptive decisions between a LSTM inverse model-based controller and the model-free A3C policy. Multi-ship collision avoidance simulations are conducted to verify the effectiveness of the model-free A3C method, the proposed inverse model-based method and the composite learning method. The simulation results indicate that the proposed composite learning based ship collision avoidance method outperforms the A3C learning method and a traditional optimization-based method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    54
    References
    5
    Citations
    NaN
    KQI
    []