Scalarized and Pareto Knowledge Gradient for Multi-objective Multi-armed Bandits

Yahyaa, Saba; Drugan, Madalina M.; Manderick, Bernard

doi:10.1007/978-3-319-27543-7_5

Saba Yahyaa¹⁹,
Madalina M. Drugan¹⁹ &
Bernard Manderick¹⁹

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 9420))

442 Accesses
1 Citations

Abstract

A multi-objective multi-armed bandit (MOMAB) problem is a sequential decision process with stochastic reward vectors. We extend knowledge gradient (KG) policy to the MOMAB problem, and we propose Pareto-KG and scalarized-KG algorithms. The Pareto-KG trades off between exploration and exploitation by combining KG policy with Pareto dominance relations. The scalarized-KG makes use of a linear or non-linear scalarization function to convert the MOMAB problem into a single-objective multi-armed bandit problem and uses KG policy to trade off between exploration and exploitation. To measure the performance of the proposed algorithms, we introduce three regret measures. We compare empirically the performance of the KG policy with UCB1 policy on a test suite of MOMAB problems with normal distributions. The Pareto-KG and scalarized-KG are the algorithms with the best empirical performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Yahyaa, S.Q., Manderick, B.: The exploration vs exploitation trade-off in the multi-armed bandit problem: an empirical study. In: 20th European Symposium on Artificial Neural Networks (ESANN), pp. 549–554 (2012)
Google Scholar
Ryzhov, I.O., Powell, W.B., Frazier, P.I.: The knowledge-gradient policy for a general class of online learning problems. J. Oper. Res. (2011)
Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. J. Mach. Learn. 47(2–3), 235–256 (2002)
Article MATH Google Scholar
Drugan, M.M., Nowe, A.: Designing multi-objective multi-armed bandits algorithms: a study. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2013)
Google Scholar
Yahyaa, S.Q., Drugan, M.M., Manderick, B.: The scalarized multi-objective multi-armed bandit problem: an empirical study of its exploration vs. exploration tradeoff. In: International Joint Conference on Neural Networks (IJCNN), pp. 2290–2297 (2014)
Google Scholar
Yahyaa, S.Q., Drugan, M.M., Manderick, B.: Linear scalarized knowledge gradient in the multi-objective multi-armed bandits problem. In: 22th European Symposium on Artificial Neural Networks (ESANN) (2014)
Google Scholar
Yahyaa, S.Q., Drugan, M.M., Manderick, B.: Annealing-pareto multi-objective multi-armed bandits algorithm. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 1–8 (2014)
Google Scholar
Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C.M., Da Fonseca, V.G.: Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans. Evol. Comput. 7, 117–132 (2002)
Article Google Scholar
Eichfelder, G.: Adaptive Scalarization Methods in Multiobjective Optimization. Springer, Heidelberg (2008)
Book MATH Google Scholar
Powell, W.B., Ryzhov, I.O.: Optimal Learning. Willey, Canada (2012)
Book Google Scholar
Kuleshov, V., Precup, D.: Algorithms for multi-armed bandit problems. J. (2014). CoRR abs/1402.6028
Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, New York (2007)
Book Google Scholar
Das, I., Dennis, J.E.: A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems. Struct. Optim. 14(1), 63–69 (1997)
Article Google Scholar
Miettinen, K.: Nonlinear Multiobjective Optimization. International Series in Operations Research and Management Science. Springer, Heidelberg (1999)
MATH Google Scholar
Yahyaa, S.Q., Drugan, M.M., Manderick, B.: Knowledge gradient for multi-objective multi-armed bandit algorithms. In: 6th International Conference on Agents and Artificial Intelligence (ICAART) (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium
Saba Yahyaa, Madalina M. Drugan & Bernard Manderick

Authors

Saba Yahyaa
View author publications
You can also search for this author in PubMed Google Scholar
Madalina M. Drugan
View author publications
You can also search for this author in PubMed Google Scholar
Bernard Manderick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saba Yahyaa .

Editor information

Editors and Affiliations

Faculty of Computer Science and Manageme, Wroclaw University of Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Swinburne University of Technology, Hawthorn, Australia
Ryszard Kowalczyk
LERIA - UFR Sciences, Angers Cedex 01, France
Béatrice Duval
Leiden, Zuid-Holland, Leiden University, Leiden, The Netherlands
Jaap van den Herik
LERIA - UFR Sciences, Angers Cedex 01, France
Stephane Loiseau
INSTICC, Polytechnic Institute of Setubal, Setubal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yahyaa, S., Drugan, M.M., Manderick, B. (2015). Scalarized and Pareto Knowledge Gradient for Multi-objective Multi-armed Bandits. In: Nguyen, N., Kowalczyk, R., Duval, B., van den Herik, J., Loiseau, S., Filipe, J. (eds) Transactions on Computational Collective Intelligence XX . Lecture Notes in Computer Science(), vol 9420. Springer, Cham. https://doi.org/10.1007/978-3-319-27543-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-27543-7_5
Published: 06 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27542-0
Online ISBN: 978-3-319-27543-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics