TAAC: Temporally Abstract Actor-Critic for Continuous Control

2021 
We propose temporally abstract actor-critic (TAAC), an off-policy RL algorithm that incorporates closed-loop temporal abstraction into the actor-critic framework in a simple manner. TAAC adds a second-stage binary policy to choose between the previous action and a new action output by an actor. Crucially, its act-or-repeat decision hinges on the actually sampled action instead of the expected behavior of the actor. This post-acting switching scheme let the overall policy make more informed decisions. TAAC has two important features: persistent exploration and a new compare-through Q operator for multi-step TD backup. We demonstrate TAAC's advantages over several strong baselines across 5 different categories of 14 continuous control tasks. Code is available at this https URL.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []