Online Learning with Graph-Structured Feedback against Adaptive Adversaries

Zhili Feng,Po-Ling Loh

Online Learning with Graph-Structured Feedback against Adaptive Adversaries

2018

Zhili Feng
Po-Ling Loh

We derive upper and lower bounds for the policy regret of $T$ -round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of $\mathcal{O}(T^{2/3})$ and $\tilde{\mathrm{O}}(T^{3/4})$ for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of $\tilde{\Omega}(T^{2/3})$ is achieved in the case of full-information feedback. We also study the particular loss structure of an oblivious adversary with switching costs, and show that in such a setting, nonrevealing strongly-observable feedback graphs achieve a lower bound of $\tilde{\Omega}(T^{2/3})$ , as well.

Keywords:

Discrete mathematics
Adversary
Tilde
Combinatorics
Regret
Mathematics
Upper and lower bounds
Bounded function
Graph
Omega
Mathematical optimization
online learning

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations