Online food ordering delivery strategies based on deep reinforcement learning

2021 
With the rapid development of Online to Offline (O2O) business, millions of transactions are performed on the popular online food ordering platforms each day. Efficient dispatching of orders and dynamic adjustment of delivery routes are critical to the success of the O2O platforms. However, the vast volume of transactions and the computational complexity of delivery routes pose significant challenges to the efficient dispatching of orders. The action to dispatch orders and the resulting state transition of couriers form a Markov decision process (MDP). The reinforcement learning technique had proven its capability of dealing with MDP. This paper proposes a Double Deep Q Netwok (DQN) based reinforcement learning framework that gradually tests and learns the order dispatching policy by communicating with an O2O simulation model developed by SUMO. The preliminary experimental results using the real order data demonstrate the effectiveness and efficiency of the proposed Double-DQN based order dispatcher. Also, different state encoding schemes are designed and tested to improve the performance of the Double-DQN based dispatcher.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    0
    Citations
    NaN
    KQI
    []