TAAC: Temporally Abstract Actor-Critic for Continuous Control
2021
We propose temporally abstract actor-critic (TAAC), an off-policy RL algorithm that incorporates closed-loop temporal abstraction into the actor-critic framework in a simple manner. TAAC adds a second-stage binary policy to choose between the previous action and a new action output by an actor. Crucially, its act-or-repeat decision hinges on the actually sampled action instead of the expected behavior of the actor. This post-acting switching scheme let the overall policy make more informed decisions. TAAC has two important features: persistent exploration and a new compare-through Q operator for multi-step TD backup. We demonstrate TAAC's advantages over several strong baselines across 5 different categories of 14 continuous control tasks. Code is available at this https URL.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI