Nash Equilibrium of Social-Learning Agents in a Restless Multiarmed Bandit Game

2017 
We study a simple model for social-learning agents in a restless multiarmed bandit (rMAB). The bandit has one good arm that changes to a bad one with a certain probability. Each agent stochastically selects one of the two methods, random search (individual learning) or copying information from other agents (social learning), using which he/she seeks the good arm. Fitness of an agent is the probability to know the good arm in the steady state of the agent system. In this model, we explicitly construct the unique Nash equilibrium state and show that the corresponding strategy for each agent is an evolutionarily stable strategy (ESS) in the sense of Thomas. It is shown that the fitness of an agent with ESS is superior to that of an asocial learner when the success probability of social learning is greater than a threshold determined from the probability of success of individual learning, the probability of change of state of the rMAB, and the number of agents. The ESS Nash equilibrium is a solution to Rogers’ paradox.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    8
    Citations
    NaN
    KQI
    []