Abstract
A multi-objective multi-armed bandit (MOMAB) problem is a sequential decision process with stochastic reward vectors. We extend knowledge gradient (KG) policy to the MOMAB problem, and we propose Pareto-KG and scalarized-KG algorithms. The Pareto-KG trades off between exploration and exploitation by combining KG policy with Pareto dominance relations. The scalarized-KG makes use of a linear or non-linear scalarization function to convert the MOMAB problem into a single-objective multi-armed bandit problem and uses KG policy to trade off between exploration and exploitation. To measure the performance of the proposed algorithms, we introduce three regret measures. We compare empirically the performance of the KG policy with UCB1 policy on a test suite of MOMAB problems with normal distributions. The Pareto-KG and scalarized-KG are the algorithms with the best empirical performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Yahyaa, S.Q., Manderick, B.: The exploration vs exploitation trade-off in the multi-armed bandit problem: an empirical study. In: 20th European Symposium on Artificial Neural Networks (ESANN), pp. 549–554 (2012)
Ryzhov, I.O., Powell, W.B., Frazier, P.I.: The knowledge-gradient policy for a general class of online learning problems. J. Oper. Res. (2011)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. J. Mach. Learn. 47(2–3), 235–256 (2002)
Drugan, M.M., Nowe, A.: Designing multi-objective multi-armed bandits algorithms: a study. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2013)
Yahyaa, S.Q., Drugan, M.M., Manderick, B.: The scalarized multi-objective multi-armed bandit problem: an empirical study of its exploration vs. exploration tradeoff. In: International Joint Conference on Neural Networks (IJCNN), pp. 2290–2297 (2014)
Yahyaa, S.Q., Drugan, M.M., Manderick, B.: Linear scalarized knowledge gradient in the multi-objective multi-armed bandits problem. In: 22th European Symposium on Artificial Neural Networks (ESANN) (2014)
Yahyaa, S.Q., Drugan, M.M., Manderick, B.: Annealing-pareto multi-objective multi-armed bandits algorithm. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 1–8 (2014)
Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C.M., Da Fonseca, V.G.: Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans. Evol. Comput. 7, 117–132 (2002)
Eichfelder, G.: Adaptive Scalarization Methods in Multiobjective Optimization. Springer, Heidelberg (2008)
Powell, W.B., Ryzhov, I.O.: Optimal Learning. Willey, Canada (2012)
Kuleshov, V., Precup, D.: Algorithms for multi-armed bandit problems. J. (2014). CoRR abs/1402.6028
Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, New York (2007)
Das, I., Dennis, J.E.: A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems. Struct. Optim. 14(1), 63–69 (1997)
Miettinen, K.: Nonlinear Multiobjective Optimization. International Series in Operations Research and Management Science. Springer, Heidelberg (1999)
Yahyaa, S.Q., Drugan, M.M., Manderick, B.: Knowledge gradient for multi-objective multi-armed bandit algorithms. In: 6th International Conference on Agents and Artificial Intelligence (ICAART) (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Yahyaa, S., Drugan, M.M., Manderick, B. (2015). Scalarized and Pareto Knowledge Gradient for Multi-objective Multi-armed Bandits. In: Nguyen, N., Kowalczyk, R., Duval, B., van den Herik, J., Loiseau, S., Filipe, J. (eds) Transactions on Computational Collective Intelligence XX . Lecture Notes in Computer Science(), vol 9420. Springer, Cham. https://doi.org/10.1007/978-3-319-27543-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-27543-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27542-0
Online ISBN: 978-3-319-27543-7
eBook Packages: Computer ScienceComputer Science (R0)