Skip to main content

Scalarized and Pareto Knowledge Gradient for Multi-objective Multi-armed Bandits

  • Chapter
  • First Online:
Transactions on Computational Collective Intelligence XX

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 9420))

Abstract

A multi-objective multi-armed bandit (MOMAB) problem is a sequential decision process with stochastic reward vectors. We extend knowledge gradient (KG) policy to the MOMAB problem, and we propose Pareto-KG and scalarized-KG algorithms. The Pareto-KG trades off between exploration and exploitation by combining KG policy with Pareto dominance relations. The scalarized-KG makes use of a linear or non-linear scalarization function to convert the MOMAB problem into a single-objective multi-armed bandit problem and uses KG policy to trade off between exploration and exploitation. To measure the performance of the proposed algorithms, we introduce three regret measures. We compare empirically the performance of the KG policy with UCB1 policy on a test suite of MOMAB problems with normal distributions. The Pareto-KG and scalarized-KG are the algorithms with the best empirical performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  2. Yahyaa, S.Q., Manderick, B.: The exploration vs exploitation trade-off in the multi-armed bandit problem: an empirical study. In: 20th European Symposium on Artificial Neural Networks (ESANN), pp. 549–554 (2012)

    Google Scholar 

  3. Ryzhov, I.O., Powell, W.B., Frazier, P.I.: The knowledge-gradient policy for a general class of online learning problems. J. Oper. Res. (2011)

    Google Scholar 

  4. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. J. Mach. Learn. 47(2–3), 235–256 (2002)

    Article  MATH  Google Scholar 

  5. Drugan, M.M., Nowe, A.: Designing multi-objective multi-armed bandits algorithms: a study. In: International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2013)

    Google Scholar 

  6. Yahyaa, S.Q., Drugan, M.M., Manderick, B.: The scalarized multi-objective multi-armed bandit problem: an empirical study of its exploration vs. exploration tradeoff. In: International Joint Conference on Neural Networks (IJCNN), pp. 2290–2297 (2014)

    Google Scholar 

  7. Yahyaa, S.Q., Drugan, M.M., Manderick, B.: Linear scalarized knowledge gradient in the multi-objective multi-armed bandits problem. In: 22th European Symposium on Artificial Neural Networks (ESANN) (2014)

    Google Scholar 

  8. Yahyaa, S.Q., Drugan, M.M., Manderick, B.: Annealing-pareto multi-objective multi-armed bandits algorithm. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 1–8 (2014)

    Google Scholar 

  9. Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C.M., Da Fonseca, V.G.: Performance assessment of multiobjective optimizers: an analysis and review. IEEE Trans. Evol. Comput. 7, 117–132 (2002)

    Article  Google Scholar 

  10. Eichfelder, G.: Adaptive Scalarization Methods in Multiobjective Optimization. Springer, Heidelberg (2008)

    Book  MATH  Google Scholar 

  11. Powell, W.B., Ryzhov, I.O.: Optimal Learning. Willey, Canada (2012)

    Book  Google Scholar 

  12. Kuleshov, V., Precup, D.: Algorithms for multi-armed bandit problems. J. (2014). CoRR abs/1402.6028

  13. Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, New York (2007)

    Book  Google Scholar 

  14. Das, I., Dennis, J.E.: A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems. Struct. Optim. 14(1), 63–69 (1997)

    Article  Google Scholar 

  15. Miettinen, K.: Nonlinear Multiobjective Optimization. International Series in Operations Research and Management Science. Springer, Heidelberg (1999)

    MATH  Google Scholar 

  16. Yahyaa, S.Q., Drugan, M.M., Manderick, B.: Knowledge gradient for multi-objective multi-armed bandit algorithms. In: 6th International Conference on Agents and Artificial Intelligence (ICAART) (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saba Yahyaa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Yahyaa, S., Drugan, M.M., Manderick, B. (2015). Scalarized and Pareto Knowledge Gradient for Multi-objective Multi-armed Bandits. In: Nguyen, N., Kowalczyk, R., Duval, B., van den Herik, J., Loiseau, S., Filipe, J. (eds) Transactions on Computational Collective Intelligence XX . Lecture Notes in Computer Science(), vol 9420. Springer, Cham. https://doi.org/10.1007/978-3-319-27543-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27543-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27542-0

  • Online ISBN: 978-3-319-27543-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics