Delay-Aware Stochastic Resource Management for Mobile Edge Computing Systems via Constrained Reinforcement Learning
2021
We design a joint radio and computational resource allocation policy for a multi-user mobile edge computing system, such that the expected power consumption is minimized while satisfying long-term delay constraints. The problem is formulated as a constrained Markov decision process (CMDP) that is efficiently solved by the proposed constrained reinforcement learning (CRL) algorithm, called successive convex programming based policy optimization (SCPPO). SCPPO solves a convex objective/feasibility surrogate problem at each update and it can provably converge to a Karush-Kuhn-Tucker (KKT) point of the original CMDP problem almost surely under some mild conditions. Moreover, SCPPO adopts an application-specific policy architecture and employs a data-efficient estimation strategy that can reuse old experiences, such that SCPPO can realize fast learning with low computational complexity.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI