|Kechao Cai||The Chinese University of Hong Kong, Hong Kong|
|Xutong Liu||The Chinese University of Hong Kong, Hong Kong|
|Yuzhen Janice Chen||The Chinese University of Hong Kong, Hong Kong|
|John Chi Shing Lui||Chinese University of Hong Kong, Hong Kong|
Network application optimization is essential for improving the performance of the application as well as its user experience. The network application parameters are crucial in making proper decisions for network application optimizations. However, many works are impractical by assuming a priori knowledge of the parameters which are usually unknown and need to be estimated. There have been studies that consider optimizing network application in an online learning context using multi-armed bandit models. However, existing frameworks are problematic as they only consider to find the optimal decisions to minimize the regret, but neglect the constraints (or guarantee) requirements which may be excessively violated. In this paper, we propose a novel online learning framework for network application optimizations with guarantee. To the best of our knowledge, we are the first to formulate the stochastic constrained multi-armed bandit model with time-varying "multi-level rewards" by taking both "regret" and "violation" into consideration. We are also the first to design a constrained bandit policy, Learning with Minimum Guarantee (LMG), with provable sub-linear regret and violation bounds. We illustrate how our framework can be applied to several emerging network application optimizations, namely, (1) opportunistic multichannel selection, (2) data-guaranteed crowdsensing, and (3) stability-guaranteed crowdsourced transcoding. To show the effectiveness of LMG in optimizing these applications with different minimum requirements, we also conduct extensive simulations by comparing LMG with existing state-of-the-art policies.