Robust Bayesian Kernel Machine Via Stein Variational Gradient Descent For Big Data

Authors:
Khanh Nguyen Deakin University
Trung Le PRaDA, Deakin University, Australia
Tu Dinh Nguyen Deakin University
Dinh Phung Deakin University
Geo?rey Webb Monash University

Introduction:

Most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization. The authors propose a robust Bayesian Kernel Machine (BKM) - a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods.

Abstract:

Kernel methods are powerful supervised machine learning models for their strong generalization ability, especially on limited data to effectively generalize on unseen data. However, most kernel methods, including the state-of-the-art LIBSVM, are vulnerable to the curse of kernelization, making them infeasible to apply to large-scale datasets. This issue is exacerbated when kernel methods are used in conjunction with a grid search to tune their kernel parameters and hyperparameters which brings in the question of model robustness when applied to real datasets. In this paper, we propose a robust Bayesian Kernel Machine (BKM) - a Bayesian kernel machine that exploits the strengths of both the Bayesian modelling and kernel methods. A key challenge for such a formulation is the need for an efficient learning algorithm. To this end, we successfully extended the recent Stein variational theory for Bayesian inference for our proposed model, resulting in fast and efficient learning and prediction algorithms. Importantly our proposed BKM is resilient to the curse of kernelization, hence making it applicable to large-scale datasets and robust to parameter tuning, avoiding the associated expense and potential pitfalls with current practice of parameter tuning. Our extensive experimental results on 12 benchmark datasets show that our BKM without tuning any parameter can achieve comparable predictive performance with the state-of-the-art LIBSVM and significantly outperforms other baselines, while obtaining significantly speedup in terms of the total training time compared with its rivals

You may want to know: