On the impact of tangled program graph marking schemes under the atari reinforcement learning benchmark

2021 
Tangled program graphs (TPG) support emergent modularity by first identifying subsets of programs that can usefully coexist (a team/ graph node) and then identifying the circumstance under which to reference other teams (arc adaptation). Variation operators manipulate the content of teams and arcs. This introduces cycles into the TPG structures. Previously, this effect was eradicated at run time by marking nodes while evaluating TPG individuals. In this work, a new marking heuristic is introduced, that of arc (learner) marking. This means that nodes can be revisited, but not the same arcs. We investigate the impact of this through 18 titles from the Arcade Learning Environment. The performance and complexity of the policies appear to be similar, but with specific tasks (game titles) resulting in preferences for one scheme over another.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    1
    Citations
    NaN
    KQI
    []