Sparse online maximum entropy inverse reinforcement learning via proximal optimization and truncated gradient

2022 
-regularization and adaptive per-state learning rates, our proposed algorithm can select features and correct the update direction of reward weights to reduce model complexity and avoid overfitting, which also speeds up convergence. During each iteration, the truncated gradient (TG) method is applied for the ME-FTPRL IRL (named ME-TFTPRL IRL) to update reward weights. This avoids the floating-point problem of the FTPRL method.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []