Opportunistic Spectrum Access: Online Search of Optimality

2008 
This paper presents an online tuning approach for the ad-hoc reinforcement learning algorithms which are used for solving the exploitation-exploration dilemma of the opportunistic spectrum access, in dynamic environments. These algorithms originate from a well-known problem in computer science: the multi-armed bandit (MAB) problem and they have provided evidence to be viable solutions for the detection and exploration of white spaces in opportunistic spectrum access. Previous work (A. Ben Hadj Alaya-Feki et al., 2008) has shown that the reinforcement learning solutions of the MAB problem are very sensitive to the statistical properties of the wireless medium access and therefore need careful tuning according to the dynamic variations of the wireless environment. This paper deals with the online tuning of those algorithms by proposing and assessing two different approaches: 1-a meta learning approach where a second learner (meta learner) is used to learn the parameters of the base learner, and 2-the Exp3 algorithm that has been previously proposed for dynamical tuning of MAB parameters in other contexts. The simulation results obtained on an IEEE 802.11medium access scenario show that one of the proposed meta-learning methods, namely the change point detection method, achieves much better performance compared to the other methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    3
    Citations
    NaN
    KQI
    []