Abstract
Non-stationary domains, that change in unpredicted ways, are a challenge for agents searching for optimal policies in sequential decision-making problems. This paper presents a combination of Markov Decision Processes (MDP) with Answer Set Programming (ASP), named Online ASP for MDP (oASP(MDP)), which is a method capable of constructing the set of domain states while the agent interacts with a changing environment. oASP(MDP) updates previously obtained policies, learnt by means of Reinforcement Learning (RL), using rules that represent the domain changes observed by the agent. These rules represent a set of domain constraints that are processed as ASP programs reducing the search space. Results show that oASP(MDP) is capable of finding solutions for problems in non-stationary domains without interfering with the action-value function approximation process.
P. E. Santos—Supported by PITE FAPESP-IBM grant 2016/18792-9.
R. L. de Mantaras—Partially supported by Generalitat de Catalunya 2017 SGR 172.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
McCarthy, J.: Generality in artificial intelligence. Commun. ACM 30(12), 1030–1035 (1987)
McCarthy, J.: Elaboration tolerance. In: Proceedings of the Fourth Symposium on Logical Formalizations of Commonsense Reasoning (Common Sense 98), vol. 98, London, UK (1998)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2015)
Garnelo, M., Arulkumaran, K., Shanahan, M.: Towards deep symbolic reinforcement learning. arXiv preprint arXiv:1609.05518 [cs], September 2016
Gelfond, M., Lifschitz, V.: The stable model semantics for logic programming. In: Kowalski, R., Bowen, K. (eds.) Proceedings of International Logic Programming Conference and Symposium, pp. 1070–1080. MIT Press, Cambridge (1988)
Lifschitz, V.: Answer set programming and plan generation. Artif. Intell. 138(1), 39–54 (2002)
Bellman, R.: A Markovian decision process. Indiana Univ. Math. J. 6(4), 679–684 (1957)
Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming, 4th edn. Princeton University Press, Princeton (1971)
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge, England (1989)
Zhang, S., Sridharan, M., Wyatt, J.L.: Mixed logical inference and probabilistic planning for robots in unreliable worlds. IEEE Trans. Robot. 31(3), 699–713 (2015)
Yang, F., Khandelwal, P., Leonetti, M., Stone, P.: Planning in answer set programming while learning action costs for mobile robots. In: AAAI Spring 2014 Symposium on Knowledge Representation and Reasoning in Robotics (AAAI-SSS) (2014)
Gelfond, M.: Answer sets. In: van Harmelen, F., Lifschitz, V., Porter, B. (eds.) Handbook of Knowledge Representation, pp. 285–316. Elsevier (2008)
Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artif. Intell. 241, 103–130 (2016)
Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T.: Answer Set Solving in Practice. Morgan & Claypool Publishers, San Rafael (2013)
Khandelwal, P., Yang, F., Leonetti, M., Lifschitz, V., Stone, P.: Planning in action language BC while learning action costs for mobile robots. In: Proceedings of the Twenty-Fourth International Conference on Automated Planning and Scheduling, ICAPS 2014, 21–26 June 2014, Portsmouth, New Hampshire, USA (2014)
Baral, C., Gelfond, M., Rushton, N.: Probabilistic reasoning with answer sets. Theory Pract. Log. Program. 9(1), 57 (2009)
Gelfond, M., Rushton, N.: Causal and probabilistic reasoning in P-log. In: Heuristics, Probabilities and Causality. A Tribute to Judea Pearl, pp. 337–359 (2010)
Even-Dar, E., Kakade, S.M., Mansour, Y.: Online Markov decision processes. Math. Oper. Res. 34(3), 726–736 (2009)
Yu, J.Y., Mannor, S., Shimkin, N.: Markov decision processes with arbitrary reward processes. Math. Oper. Res. 34(3), 737–757 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Ferreira, L.A., Bianchi, R.A.C., Santos, P.E., de Mantaras, R.L. (2018). A Method for the Online Construction of the Set of States of a Markov Decision Process Using Answer Set Programming. In: Mouhoub, M., Sadaoui, S., Ait Mohamed, O., Ali, M. (eds) Recent Trends and Future Technology in Applied Intelligence. IEA/AIE 2018. Lecture Notes in Computer Science(), vol 10868. Springer, Cham. https://doi.org/10.1007/978-3-319-92058-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-92058-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92057-3
Online ISBN: 978-3-319-92058-0
eBook Packages: Computer ScienceComputer Science (R0)