Skip to main content

A Method for the Online Construction of the Set of States of a Markov Decision Process Using Answer Set Programming

  • Conference paper
  • First Online:
Recent Trends and Future Technology in Applied Intelligence (IEA/AIE 2018)

Abstract

Non-stationary domains, that change in unpredicted ways, are a challenge for agents searching for optimal policies in sequential decision-making problems. This paper presents a combination of Markov Decision Processes (MDP) with Answer Set Programming (ASP), named Online ASP for MDP (oASP(MDP)), which is a method capable of constructing the set of domain states while the agent interacts with a changing environment. oASP(MDP) updates previously obtained policies, learnt by means of Reinforcement Learning (RL), using rules that represent the domain changes observed by the agent. These rules represent a set of domain constraints that are processed as ASP programs reducing the search space. Results show that oASP(MDP) is capable of finding solutions for problems in non-stationary domains without interfering with the action-value function approximation process.

P. E. Santos—Supported by PITE FAPESP-IBM grant 2016/18792-9.

R. L. de Mantaras—Partially supported by Generalitat de Catalunya 2017 SGR 172.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. McCarthy, J.: Generality in artificial intelligence. Commun. ACM 30(12), 1030–1035 (1987)

    Article  MathSciNet  Google Scholar 

  2. McCarthy, J.: Elaboration tolerance. In: Proceedings of the Fourth Symposium on Logical Formalizations of Commonsense Reasoning (Common Sense 98), vol. 98, London, UK (1998)

    Google Scholar 

  3. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2015)

    Google Scholar 

  4. Garnelo, M., Arulkumaran, K., Shanahan, M.: Towards deep symbolic reinforcement learning. arXiv preprint arXiv:1609.05518 [cs], September 2016

  5. Gelfond, M., Lifschitz, V.: The stable model semantics for logic programming. In: Kowalski, R., Bowen, K. (eds.) Proceedings of International Logic Programming Conference and Symposium, pp. 1070–1080. MIT Press, Cambridge (1988)

    Google Scholar 

  6. Lifschitz, V.: Answer set programming and plan generation. Artif. Intell. 138(1), 39–54 (2002)

    Article  MathSciNet  Google Scholar 

  7. Bellman, R.: A Markovian decision process. Indiana Univ. Math. J. 6(4), 679–684 (1957)

    Article  MathSciNet  Google Scholar 

  8. Bellman, R.E., Dreyfus, S.E.: Applied Dynamic Programming, 4th edn. Princeton University Press, Princeton (1971)

    MATH  Google Scholar 

  9. Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge, England (1989)

    Google Scholar 

  10. Zhang, S., Sridharan, M., Wyatt, J.L.: Mixed logical inference and probabilistic planning for robots in unreliable worlds. IEEE Trans. Robot. 31(3), 699–713 (2015)

    Article  Google Scholar 

  11. Yang, F., Khandelwal, P., Leonetti, M., Stone, P.: Planning in answer set programming while learning action costs for mobile robots. In: AAAI Spring 2014 Symposium on Knowledge Representation and Reasoning in Robotics (AAAI-SSS) (2014)

    Google Scholar 

  12. Gelfond, M.: Answer sets. In: van Harmelen, F., Lifschitz, V., Porter, B. (eds.) Handbook of Knowledge Representation, pp. 285–316. Elsevier (2008)

    Google Scholar 

  13. Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. Artif. Intell. 241, 103–130 (2016)

    Article  MathSciNet  Google Scholar 

  14. Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T.: Answer Set Solving in Practice. Morgan & Claypool Publishers, San Rafael (2013)

    MATH  Google Scholar 

  15. Khandelwal, P., Yang, F., Leonetti, M., Lifschitz, V., Stone, P.: Planning in action language BC while learning action costs for mobile robots. In: Proceedings of the Twenty-Fourth International Conference on Automated Planning and Scheduling, ICAPS 2014, 21–26 June 2014, Portsmouth, New Hampshire, USA (2014)

    Google Scholar 

  16. Baral, C., Gelfond, M., Rushton, N.: Probabilistic reasoning with answer sets. Theory Pract. Log. Program. 9(1), 57 (2009)

    Article  MathSciNet  Google Scholar 

  17. Gelfond, M., Rushton, N.: Causal and probabilistic reasoning in P-log. In: Heuristics, Probabilities and Causality. A Tribute to Judea Pearl, pp. 337–359 (2010)

    Google Scholar 

  18. Even-Dar, E., Kakade, S.M., Mansour, Y.: Online Markov decision processes. Math. Oper. Res. 34(3), 726–736 (2009)

    Article  MathSciNet  Google Scholar 

  19. Yu, J.Y., Mannor, S., Shimkin, N.: Markov decision processes with arbitrary reward processes. Math. Oper. Res. 34(3), 737–757 (2009)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonardo Anjoletto Ferreira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ferreira, L.A., Bianchi, R.A.C., Santos, P.E., de Mantaras, R.L. (2018). A Method for the Online Construction of the Set of States of a Markov Decision Process Using Answer Set Programming. In: Mouhoub, M., Sadaoui, S., Ait Mohamed, O., Ali, M. (eds) Recent Trends and Future Technology in Applied Intelligence. IEA/AIE 2018. Lecture Notes in Computer Science(), vol 10868. Springer, Cham. https://doi.org/10.1007/978-3-319-92058-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92058-0_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92057-3

  • Online ISBN: 978-3-319-92058-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics