Risk Averse Reinforcement Learning for Mixed Multi-agent Environments
2019
Most real world applications of multi-agent systems, need to keep a balance between maximizing the rewards and minimizing the risks. In this work we consider a popular risk measure, variance of return (VOR), as a constraint in the agent's policy learning algorithm in the mixed cooperative and competitive environments. We present a multi-timescale actor critic method for risk sensitive Markov games where the risk is modeled as a VOR constraint. We also show that the risk-averse policies satisfy the desired risk constraint without compromising much on the overall reward for a popular task.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
14
References
0
Citations
NaN
KQI