Online Partial Conditional Plan Synthesis for POMDPs With Safe-Reachability Objectives: Methods and Experiments[-5pt]

2021 
The framework of partially observable Markov decision processes (POMDPs) offers a standard approach to model uncertainty in many robot tasks. Traditionally, POMDPs are formulated with optimality objectives. In this article, we study a different formulation of POMDPs with Boolean objectives . For robotic domains that require a correctness guarantee of accomplishing tasks, Boolean objectives are natural formulations. We investigate the problem of POMDPs with a common Boolean objective: safe reachability , requiring that the robot eventually reaches a goal state with a probability above a threshold while keeping the probability of visiting unsafe states below a different threshold. Our approach builds upon the previous work that represents POMDPs with Boolean objectives using symbolic constraints. We employ a satisfiability modulo theories (SMTs) solver to efficiently search for solutions, i.e., policies or conditional plans that specify the action to take contingent on every possible event. A full policy or conditional plan is generally expensive to compute. To improve computational efficiency, we introduce the notion of partial conditional plans that cover sampled events to approximate a full conditional plan. Our approach constructs a partial conditional plan parameterized by a replanning probability . We prove that the failure rate of the constructed partial conditional plan is bounded by the replanning probability. Our approach allows users to specify an appropriate bound on the replanning probability to balance efficiency and correctness. Moreover, we update this bound properly to quickly detect whether the current partial conditional plan meets the bound and avoid unnecessary computation. In addition, to further improve the efficiency, we cache partial conditional plans for sampled belief states and reuse these cached plans if possible. We validate our approach in several robotic domains. The results show that our approach outperforms a previous policy synthesis approach for POMDPs with safe-reachability objectives in these domains. Note to Practitioners —This article was motivated by two observations. On the one hand, in robotics applications where uncertainty in sensing and actions is present, the solution to the classical partially observable Markov decision process (POMDP) formulation is expensive to compute in general. On the other hand, in certain practical scenarios, formulations other than the classical POMDP make a lot of sense and can provide flexibility in balancing efficiency and correctness. This article considers a modified POMDP formulation that includes a Boolean objective, namely safe reachability. This article uses the notion of a partial conditional plan. Rather than explicitly enumerating all possible observations to construct a full conditional plan, this work samples a subset of all observations to ensure bounded replanning probability. Our theoretical and empirical results show that the failure rate of the constructed partial conditional plan is bounded by the replanning probability. Moreover, these partial conditional plans can be cached to further improve the performance. Our results suggest that for domains where replanning is easy, increasing the replanning probability bound usually leads to better scalability, and for domains where replanning is difficult or impossible in some states, we can decrease the bound and allocate more computation time to achieve a higher success rate. Hence, in certain cases, the practitioner can take advantage of their knowledge of the problem domain to scale to larger problems. Preliminary physical experiments suggest that this approach is applicable to real-world robotic domains, but it requires a discrete representation of the workspace. How to deal with continuous workspace directly is an interesting future direction.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    46
    References
    0
    Citations
    NaN
    KQI
    []