Piecewise Constant Policies for Human-Compatible Congestion Mitigation

2021 
Transportation systems have a significant impact on carbon emissions and urban mobility worldwide. Previous work has shown that reinforcement-learning-based control of autonomous vehicles (AVs) can smooth traffic flow and improve congestion. Due to the safety and interpretability concerns around deploying AVs, we propose structured reinforcement learning policies to provide real-time guidance to human drivers to stabilize traffic. Motivated by the limitations of driver reaction times, we introduce a class of piecewise constant policies designed to be executed by human drivers to mitigate congestion. These policies are defined by a simple action space and an action extension parameter Δ where each action must be held constant for Δ timesteps. We prove that the optimal piecewise constant trajectory deviates in reward from the optimal trajectory by at most O(HΔ) over a horizon of length H. We show that piecewise constant policies can reproduce traffic smoothing in single-lane settings for large action extension parameters (up to 14s), yielding near-optimal average velocities of above 4.2 m/s compared to 3.6 m/s by a human driver baseline. Moreover, these policies see some improvements for action extension parameters up to 24s and are robust to lane changing, indicating promise for more complex environments.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    0
    Citations
    NaN
    KQI
    []