EXP4-DFDC: A Non-Stochastic Multi-Armed Bandit for Cache Replacement

Farzana Yusuf,Camilo Valdes,Vitalii Stebliankin,Guiseppe Vietri,Giri Narasimhan

EXP4-DFDC: A Non-Stochastic Multi-Armed Bandit for Cache Replacement

2020

Farzana Yusuf
Camilo Valdes
Vitalii Stebliankin
Guiseppe Vietri
Giri Narasimhan

In this work we study a variant of the well-known multi-armed bandit (MAB) problem, which has the properties of a delay in feedback, and a loss that declines over time. We introduce an algorithm, EXP4-DFDC, to solve this MAB variant, and demonstrate that the regret vanishes as the time increases. We also show that LeCaR, a previously published machine learning-based cache replacement algorithm, is an instance of EXP4-DFDC. Our results can be used to provide insight on the choice of hyperparameters, and optimize future LeCaR instances.

Keywords:

Regret
Mathematical optimization
Hyperparameter
Computer science
Multi-armed bandit
Cache

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations