Pruning Dominated Policies in Multiobjective Pareto Q-Learning

Mandow, Lawrence; Pérez-de-la-Cruz, José-Luis

doi:10.1007/978-3-030-00374-6_23

Lawrence Mandow²⁰ &
José-Luis Pérez-de-la-Cruz²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11160))

Included in the following conference series:

Conference of the Spanish Association for Artificial Intelligence

875 Accesses
1 Citations

Abstract

The solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximates the whole set of all Pareto-optimal deterministic policies by directly generalizing Q-learning to the multiobjective setting. In this paper we present a modification of MPQ-learning that avoids useless cyclical policies and thus improves the number of training steps required for convergence.

Supported by: the Spanish Government, Agencia Estatal de Investigación (AEI) and European Union, Fondo Europeo de Desarrollo Regional (FEDER), grant TIN2016-80774-R (AEI/FEDER, UE); and Plan Propio de Investigación de la Universidad de Málaga - Campus de Excelencia Internacional Andalucía Tech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Drugan, M., Wiering, M., Vamplew, P., Chetty, M.: Editorial: special issue on multi-objective reinforcement learning. Neurocomputing 263, 1–2 (2017)
Article Google Scholar
Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. (JAIR) 48, 67–113 (2013)
Article MathSciNet Google Scholar
Ruiz-Montiel, M., Mandow, L., Perez-de-la Cruz, J.-L.: A temporal difference method for multi-objective reinforcement learning. Neurocomputing 263, 15–25 (2017)
Article Google Scholar
Vamplew, P., Dazeley, R., Berry, A., Issabekov, R., Dekker, E.: Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach. Learn. 84(1–2), 51–80 (2011)
Article MathSciNet Google Scholar
Vamplew, P., Yearwood, J., Dazeley, R., Berry, A.: On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol. 5360, pp. 372–378. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89378-3_37
Chapter Google Scholar
Van Moffaert, K., Nowé, A.: Multi-objective reinforcement learning using sets of Pareto dominating policies. J. Mach. Learn. Res. 15, 3663–3692 (2014)
MathSciNet MATH Google Scholar
Wiering, M.A., Withagen, M., Drugan, M.M.: Model-based multi-objective reinforcement learning. In: 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Andalucía Tech, Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Málaga, Spain
Lawrence Mandow & José-Luis Pérez-de-la-Cruz

Authors

Lawrence Mandow
View author publications
You can also search for this author in PubMed Google Scholar
José-Luis Pérez-de-la-Cruz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lawrence Mandow .

Editor information

Editors and Affiliations

Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Francisco Herrera
Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Sergio Damas
Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Rosana Montes
Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Sergio Alonso
Andalusian Research Institute on Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, Spain
Óscar Cordón
Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
Antonio González
School of Engineering, Pablo de Olavide University, Seville, Spain
Alicia Troncoso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mandow, L., Pérez-de-la-Cruz, JL. (2018). Pruning Dominated Policies in Multiobjective Pareto Q-Learning. In: Herrera, F., et al. Advances in Artificial Intelligence. CAEPIA 2018. Lecture Notes in Computer Science(), vol 11160. Springer, Cham. https://doi.org/10.1007/978-3-030-00374-6_23

Download citation

DOI: https://doi.org/10.1007/978-3-030-00374-6_23
Published: 27 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00373-9
Online ISBN: 978-3-030-00374-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics