Fast learning of approximation policies for coordination in distributed networks

2014 
This paper presents the Average-Max Reinforcement Learning (AMRL) algorithm that can be used to approximate a global policy of a Markov Decision Process (MDP) as a set of local policies that can be executed in a partially observable environment. The local policies are obtained by reinforcement learning and averaging state-action tables under a stochastic process model. This approach overcomes the scalability problem that arises when a large MDP has to be solved exactly. The approach is motivated by the problem of computing coordination policies for correlated but distributed sensors. We demonstrate the performance of this learning scheme on a simulation of a wireless body sensor network. These results show that the performance of the AMRL algorithm is significantly better than a random policy and is close to the optimal policy that can be obtained from solving a global MDP. The results also show that the AMRL algorithm is scalable to networks represented by large state spaces.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    8
    References
    1
    Citations
    NaN
    KQI
    []