Robust Action Selection in Partially Observable Markov Decision Processes with Model Uncertainty

2018 
Partially observable Markov decision processes (POMDPs) are models for sequential decision-making under state transition uncertainty, and sensing uncertainty of the underlying state. Model uncertainty is an important concern when the models, for which an action policy was optimized, change in time, e.g., degrading sensors that result in a drift in the observation function. Replanning a policy whenever a model drifts (if feasible) is both a time consuming and computationally expensive process. At the other extreme, ignoring the drift and following the original policy can lead to high-risk actions with high costs. We present an efficient approach that post-processes a policy computed using initial models to select actions robust to changes in the observation function. The key idea is to maintain a belief region rather than a belief point about the state of the system, and perform online robust action selection w.r.t. the current belief region. Specifically, we formulate a convex optimization problem to select the action that maximizes the worst case reward function for a convexified belief region. Simulation results demonstrate the ability of our approach to avoid high-risk actions when the system is in uncertain states.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    4
    Citations
    NaN
    KQI
    []