Online Kernel Selection with Multiple Bandit Feedbacks in Random Feature Space
Online kernel selection is critical to online kernel learning, and must address the exploration-exploitation dilemma, where we explore new kernels to find the best one and exploit the kernel that showed the best performance in the past. In this paper, we propose a novel multi-armed bandit solution to the exploration-exploitation dilemma in online kernel selection. We first correspond each candidate kernel to an arm of a multi-armed bandit problem. Different from typical multi-armed bandit models where only one kernel is selected at each round, we sample multiple kernels with replacement according to a probability distribution. Then, we make prediction with the hypotheses learned in the random feature spaces specified by the selected kernels, and incur multiple losses referred to as multiple bandit feedbacks. Finally, we use all the feedbacks to update the probability distribution. We prove that the proposed approach enjoys a sub-linear expected regret bound. Experimental results on benchmark datasets show that the proposed approach has a comparable performance with existing online kernel selection methods.
KeywordsOnline kernel selection Exploration-exploitation dilemma Multiple bandit feedbacks Random feature space
The work was supported in part by the National Natural Science Foundation of China under grant No. 61673293.
- 2.Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends\(\textregistered \) Mach. Learn. 5(1), 1–122 (2012)Google Scholar
- 5.Dekel, O., Shalev-Shwartz, S., Singer, Y.: The Forgetron: a kernel-based perceptron on a fixed budget. In: Proceedings of the 19th Annual Conference on Neural Information Processing Systems (NIPS), pp. 259–266 (2005)Google Scholar
- 7.Foster, D.J., Kale, S., Mohri, M., Sridharan, K.: Parameter-free online learning via model selection. In: Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), pp. 6022–6032 (2017)Google Scholar
- 9.Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
- 11.Nguyen, T.D., Le, T., Bui, H., Phung, D.: Large-scale online kernel learning with random feature reparameterization. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI), pp. 2543–2549 (2017)Google Scholar
- 12.Rahimi, A., Recht, B.: Random features for large-scale kernel machine. In: Proceedings of the 21st Annual Conference on Neural Information Processing Systems (NIPS), pp. 1177–1184 (2007)Google Scholar
- 13.Shalev-Shwartz, S.: Online learning and online convex optimization. Found. Trends\(\textregistered \) Mach. Learn. 4(2), 107–194 (2012)Google Scholar
- 14.Tossou, A.C.Y., Dimitrakakis, C.: Achieving privacy in the adversarial multi-armed bandit. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI), pp. 2653–2659 (2017)Google Scholar
- 15.Yang, T., Mahdavi, M., Jin, R., Yi, J., Hoi, S.C.H.: Online kernel selection: algorithms and evaluations. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI), pp. 1197–1202 (2012)Google Scholar