Expert Level Control of Ramp Metering Based on Multi-Task Deep Reinforcement Learning

2018 
This paper shows how the recent breakthroughs in reinforcement learning (RL) that have enabled robots to learn to play arcade video games, walk, or assemble colored bricks, can be used to perform other tasks that are currently at the core of engineering cyberphysical systems. We present the first use of RL for the control of systems modeled by discretized non-linear partial differential equations (PDEs) and devise a novel algorithm to use non-parametric control techniques for large multi-agent systems. Cyberphysical systems (e.g., hydraulic channels, transportation systems, the energy grid, and electromagnetic systems) are commonly modeled by PDEs, which historically have been a reliable way to enable engineering applications in these domains. However, it is known that the control of these PDE models is notoriously difficult. We show how neural network-based RL enables the control of discretized PDEs whose parameters are unknown, random, and time-varying. We introduce an algorithm of mutual weight regularization (MWR), which alleviates the curse of dimensionality of multi-agent control schemes by sharing experience between agents while giving each agent the opportunity to specialize its action policy so as to tailor it to the local parameters of the part of the system it is located in. A discretized PDE, such as the scalar Lighthill–Whitham–Richards PDE can indeed be considered as a macroscopic freeway traffic simulator and which presents the most salient challenges for learning to control large cyberphysical system with multiple agents. We consider two different discretization procedures and show the opportunities offered by applying deep reinforcement for continuous control on both. Using a neural RL PDE controller on a traffic flow simulation based on a Godunov discretization of the San Francisco Bay Bridge, we are able to achieve precise adaptive metering without model calibration thereby beating the state of the art in traffic metering. Furthermore, with the more accurate BeATS simulator, we manage to achieve a control performance on par with ALINEA, a state-of-the-art parametric control scheme, and show how using MWR improves the learning procedure.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    26
    References
    90
    Citations
    NaN
    KQI
    []