Efficient Solutions for Targeted Control of Multi-Agent MDPs

2021 
This work considers multi-agent systems where agents' decisions are influenced by a global entity, named the superplayer. The superplayer's goal is to influence the agents to realize game-play at target joint strategies. Defining each player's utility as a function of the superplayer's influence, the problem can be described and solved as a parametric game. We focus on systems whose agents can be clustered; as a consequence the superplayer can partition the agents and then find a control policy for each cluster. Under Markovian repeated play dynamics, an overall control policy is found by solving a related Markov decision problem. This cluster-based control policy generally yields a greater optimal value than a policy that assigns the same control to all agents. To efficiently solve the MDP in the larger action space, we introduce a Clustered Value Iteration (CVI) algorithm which solves the cluster-based MDP via a “round robin” approach. CVI provides a significant computational savings as compared to standard value iteration (VI), as the complexity of each iteration scales linearly (versus exponentially) with the number of clusters. Properties of CVI are shown and used to prove convergence. Finally, simulations empirically show how CVI converges significantly faster than VI with only a small penalty in value.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    18
    References
    0
    Citations
    NaN
    KQI
    []