Near Optimal Adversarial Attack on UCB Bandits.

Shiliang Zuo

Near Optimal Adversarial Attack on UCB Bandits.

2020

Shiliang Zuo

We consider a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. We propose a novel attack strategy that manipulates a UCB principle into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\sqrt{\log T}$, where $T$ is the number of rounds. We also prove the first lower bound on the cumulative attack cost. Our lower bound matches our upper bound up to $\log \log T$ factors, showing our attack to be near optimal.

Keywords:

Adversarial system
attack strategy
cumulative cost
Upper and lower bounds
Computer science
Combinatorics

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations