Constrained Reinforcement Learning via Policy Splitting

Haoxian Chen,Henry Lam,Fengpei Li,Amirhossein Meisami

Constrained Reinforcement Learning via Policy Splitting

2020

Haoxian Chen
Henry Lam
Fengpei Li
Amirhossein Meisami

We develop a model-free reinforcement learning approach to solve constrained Markov decision processes, where the objective and budget constraints are in the form of infinite-horizon discounted expectations, and the rewards and costs are learned sequentially from data. We propose a two-stage procedure where we first search over deterministic policies, followed by an aggregation with a mixture parameter search, that generates policies with simultaneous guarantees on near-optimality and feasibility. We also numerically illustrate our approach by applying it to an online advertising problem.

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations