A Method for the Online Construction of the Set of States of a Markov Decision Process Using Answer Set Programming

Ferreira, Leonardo Anjoletto; Bianchi, Reinaldo A. C.; Santos, Paulo E.; de Mantaras, Ramon Lopez

doi:10.1007/978-3-319-92058-0_1

Leonardo Anjoletto Ferreira¹⁷,
Reinaldo A. C. Bianchi¹⁸,
Paulo E. Santos¹⁸ &
…
Ramon Lopez de Mantaras¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10868))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

3099 Accesses
3 Citations

Abstract

Non-stationary domains, that change in unpredicted ways, are a challenge for agents searching for optimal policies in sequential decision-making problems. This paper presents a combination of Markov Decision Processes (MDP) with Answer Set Programming (ASP), named Online ASP for MDP (oASP(MDP)), which is a method capable of constructing the set of domain states while the agent interacts with a changing environment. oASP(MDP) updates previously obtained policies, learnt by means of Reinforcement Learning (RL), using rules that represent the domain changes observed by the agent. These rules represent a set of domain constraints that are processed as ASP programs reducing the search space. Results show that oASP(MDP) is capable of finding solutions for problems in non-stationary domains without interfering with the action-value function approximation process.

P. E. Santos—Supported by PITE FAPESP-IBM grant 2016/18792-9.

R. L. de Mantaras—Partially supported by Generalitat de Catalunya 2017 SGR 172.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

McCarthy, J.: Generality in artificial intelligence. Commun. ACM 30(12), 1030–1035 (1987)
Article MathSciNet Google Scholar
McCarthy, J.: Elaboration tolerance. In: Proceedings of the Fourth Symposium on Logical Formalizations of Commonsense Reasoning (Common Sense 98), vol. 98, London, UK (1998)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2015)
Google Scholar
Garnelo, M., Arulkumaran, K., Shanahan, M.: Towards deep symbolic reinforcement learning. arXiv preprint arXiv:1609.05518 [cs], September 2016
Gelfond, M., Lifschitz, V.: The stable model semantics for logic programming. In: Kowalski, R., Bowen, K. (eds.) Proceedings of International Logic Programming Conference and Symposium, pp. 1070–1080. MIT Press, Cambridge (1988)
Google Scholar
Lifschitz, V.: Answer set programming and plan generation. Artif. Intell. 138(1), 39–54 (2002)
Article MathSciNet Google Scholar
Bellman, R.: A Markovian decision process. Indiana Univ. Math. J. 6(4), 679–684 (1957)
Article MathSciNet Google Scholar
Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming, 4th edn. Princeton University Press, Princeton (1971)
MATH Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge, England (1989)
Google Scholar
Zhang, S., Sridharan, M., Wyatt, J.L.: Mixed logical inference and probabilistic planning for robots in unreliable worlds. IEEE Trans. Robot. 31(3), 699–713 (2015)
Article Google Scholar
Yang, F., Khandelwal, P., Leonetti, M., Stone, P.: Planning in answer set programming while learning action costs for mobile robots. In: AAAI Spring 2014 Symposium on Knowledge Representation and Reasoning in Robotics (AAAI-SSS) (2014)
Google Scholar
Gelfond, M.: Answer sets. In: van Harmelen, F., Lifschitz, V., Porter, B. (eds.) Handbook of Knowledge Representation, pp. 285–316. Elsevier (2008)
Google Scholar
Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artif. Intell. 241, 103–130 (2016)
Article MathSciNet Google Scholar
Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T.: Answer Set Solving in Practice. Morgan & Claypool Publishers, San Rafael (2013)
MATH Google Scholar
Khandelwal, P., Yang, F., Leonetti, M., Lifschitz, V., Stone, P.: Planning in action language BC while learning action costs for mobile robots. In: Proceedings of the Twenty-Fourth International Conference on Automated Planning and Scheduling, ICAPS 2014, 21–26 June 2014, Portsmouth, New Hampshire, USA (2014)
Google Scholar
Baral, C., Gelfond, M., Rushton, N.: Probabilistic reasoning with answer sets. Theory Pract. Log. Program. 9(1), 57 (2009)
Article MathSciNet Google Scholar
Gelfond, M., Rushton, N.: Causal and probabilistic reasoning in P-log. In: Heuristics, Probabilities and Causality. A Tribute to Judea Pearl, pp. 337–359 (2010)
Google Scholar
Even-Dar, E., Kakade, S.M., Mansour, Y.: Online Markov decision processes. Math. Oper. Res. 34(3), 726–736 (2009)
Article MathSciNet Google Scholar
Yu, J.Y., Mannor, S., Shimkin, N.: Markov decision processes with arbitrary reward processes. Math. Oper. Res. 34(3), 737–757 (2009)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Accesstage Tecnologia S.A., São Paulo, Brazil
Leonardo Anjoletto Ferreira
Artificial Intelligence in Automation Group, Centro Universitário FEI, São Bernardo do Campo, Brazil
Reinaldo A. C. Bianchi & Paulo E. Santos
Institut d’Investigació en Intelligéncia Artificial, Spanish National Research Council, Barcelona, Spain
Ramon Lopez de Mantaras

Authors

Leonardo Anjoletto Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Reinaldo A. C. Bianchi
View author publications
You can also search for this author in PubMed Google Scholar
Paulo E. Santos
View author publications
You can also search for this author in PubMed Google Scholar
Ramon Lopez de Mantaras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonardo Anjoletto Ferreira .

Editor information

Editors and Affiliations

University of Regina, Regina, SK, Canada
Malek Mouhoub
University of Regina, Regina, SK, Canada
Samira Sadaoui
Concordia University, Montreal, QC, Canada
Otmane Ait Mohamed
Texas State University, San Marcos, TX, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferreira, L.A., Bianchi, R.A.C., Santos, P.E., de Mantaras, R.L. (2018). A Method for the Online Construction of the Set of States of a Markov Decision Process Using Answer Set Programming. In: Mouhoub, M., Sadaoui, S., Ait Mohamed, O., Ali, M. (eds) Recent Trends and Future Technology in Applied Intelligence. IEA/AIE 2018. Lecture Notes in Computer Science(), vol 10868. Springer, Cham. https://doi.org/10.1007/978-3-319-92058-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-92058-0_1
Published: 30 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92057-3
Online ISBN: 978-3-319-92058-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics