Abstract
The solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximates the whole set of all Pareto-optimal deterministic policies by directly generalizing Q-learning to the multiobjective setting. In this paper we present a modification of MPQ-learning that avoids useless cyclical policies and thus improves the number of training steps required for convergence.
Supported by: the Spanish Government, Agencia Estatal de Investigación (AEI) and European Union, Fondo Europeo de Desarrollo Regional (FEDER), grant TIN2016-80774-R (AEI/FEDER, UE); and Plan Propio de Investigación de la Universidad de Málaga - Campus de Excelencia Internacional Andalucía Tech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Drugan, M., Wiering, M., Vamplew, P., Chetty, M.: Editorial: special issue on multi-objective reinforcement learning. Neurocomputing 263, 1–2 (2017)
Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. (JAIR) 48, 67–113 (2013)
Ruiz-Montiel, M., Mandow, L., Perez-de-la Cruz, J.-L.: A temporal difference method for multi-objective reinforcement learning. Neurocomputing 263, 15–25 (2017)
Vamplew, P., Dazeley, R., Berry, A., Issabekov, R., Dekker, E.: Empirical evaluation methods for multiobjective reinforcement learning algorithms. Mach. Learn. 84(1–2), 51–80 (2011)
Vamplew, P., Yearwood, J., Dazeley, R., Berry, A.: On the limitations of scalarisation for multi-objective reinforcement learning of Pareto fronts. In: Wobcke, W., Zhang, M. (eds.) AI 2008. LNCS (LNAI), vol. 5360, pp. 372–378. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89378-3_37
Van Moffaert, K., Nowé, A.: Multi-objective reinforcement learning using sets of Pareto dominating policies. J. Mach. Learn. Res. 15, 3663–3692 (2014)
Wiering, M.A., Withagen, M., Drugan, M.M.: Model-based multi-objective reinforcement learning. In: 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Mandow, L., Pérez-de-la-Cruz, JL. (2018). Pruning Dominated Policies in Multiobjective Pareto Q-Learning. In: Herrera, F., et al. Advances in Artificial Intelligence. CAEPIA 2018. Lecture Notes in Computer Science(), vol 11160. Springer, Cham. https://doi.org/10.1007/978-3-030-00374-6_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-00374-6_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00373-9
Online ISBN: 978-3-030-00374-6
eBook Packages: Computer ScienceComputer Science (R0)