Learning to Act Optimally in Partially Observable Markov Decision Processes Using Hybrid Probabilistic Logic Programs

Saad, Emad

doi:10.1007/978-3-642-23963-2_39

Emad Saad²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6929))

Included in the following conference series:

International Conference on Scalable Uncertainty Management

611 Accesses

Abstract

We present a probabilistic logic programming framework to reinforcement learning, by integrating reinforcement learning, in POMDP environments, with normal hybrid probabilistic logic programs with probabilistic answer set semantics, that is capable of representing domain-specific knowledge. We formally prove the correctness of our approach. We show that the complexity of finding a policy for a reinforcement learning problem in our approach is NP-complete. In addition, we show that any reinforcement learning problem can be encoded as a classical logic program with answer set semantics. We also show that a reinforcement learning problem can be encoded as a SAT problem. We present a new high level action description language that allows the factored representation of POMDP. Moreover, we modify the original model of POMDP so that it be able to distinguish between knowledge producing actions and actions that change the environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baral, C., Tran, N., Tuan, L.C.: Reasoning about actions in a probabilistic setting. In: AAAI 2002 (2002)
Google Scholar
Bagnell, J., Kakade, S., Ng, A., Schneider, J.: Policy search by dynamic programming. In: Neural Information Processing Systems, vol. 16. MIT Press, Cambridge (2003)
Google Scholar
Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: structural assumptions and computational leverage. Journal of AI Research 11, 1–94 (1999)
MathSciNet MATH Google Scholar
Boutilier, C., Reiter, R., Price, B.: Symbolic dynamic programming for first-order MDPs. In: 17th IJCAI (2001)
Google Scholar
Draper, D., Hanks, S., Weld, D.: Probabilistic planning with information gathering and contingent execution. In: 2nd ICAIPS (1994)
Google Scholar
Eiter, T., Lukasiewicz, T.: Probabilistic reasoning about actions in nonmonotonic causal theories. In: 19th Conference on Uncertainty in Artificial Intelligence (2003)
Google Scholar
Gelfond, M., Lifschitz, V.: The stable model semantics for logic programming. In: ICSLP. MIT Press, Cambridge (1988)
Google Scholar
Gelfond, M., Lifschitz, V.: Classical negation in logic programs and disjunctive databases. New Generation Computing 9(3-4), 363–385 (1991)
Article MATH Google Scholar
Gelfond, M., Lifschitz, V.: Representing action and change by logic programs. Journal of Logic Programming 17, 301–321 (1993)
Article MathSciNet MATH Google Scholar
Iocchi, L., Lukasiewicz, T., Nardi, D., Rosati, R.: Reasoning about actions with sensing under qualitative and probabilistic uncertainty. In: 16th ECAI (2004)
Google Scholar
Kaelbling, L., Littman, M., Cassandra, A.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99–134 (1998)
Article MathSciNet MATH Google Scholar
Kaelbling, L., Littman, M., Moore, A.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Kautz, H., Selman, B.: Pushing the envelope: planning, propositional logic, and stochastic search. In: 13th National Conference on Artificial Intelligence (1996)
Google Scholar
Kersting, K., De Raedt, L.: Logical Markov decision programs and the convergence of logical TD(λ). In: 14th International Conference on Inductive Logic Programming (2004)
Google Scholar
Kushmerick, N., Hanks, S., Weld, D.: An algorithm for probabilistic planning. Artificial Intelligence 76(1-2), 239–286 (1995)
Article Google Scholar
Lin, F., Zhao, Y.: ASSAT: Computing answer sets of a logic program by SAT solvers. Artificial Intelligence 157(1-2), 115–137 (2004)
Article MathSciNet MATH Google Scholar
Littman, M., Cassandra, A., Kaelbling, L.: Learning policies for partially observable environments: scaling up. In: 12th ICML (1995)
Google Scholar
Majercik, S., Littman, M.: Contingent planning under uncertainty via stochastic satisfiability. Artificial Intelligence 147(1–2), 119–162 (2003)
Article MathSciNet MATH Google Scholar
Mundhenk, M., Goldsmith, J., Lusena, C., Allender, E.: Complexity of finite-horizon Markov decision process problems. Journal of the ACM (2000)
Google Scholar
Niemela, I., Simons, P.: Efficient implementation of the well-founded and stable model semantics. In: Joint ICSLP, pp. 289–303 (1996)
Google Scholar
Saad, E.: Incomplete knowlege in hybrid probabilistic logic programs. In: 10th European Conference on Logics in Artificial Intelligence (2006)
Google Scholar
Saad, E.: Probabilistic planning with imperfect sensing actions using hybrid probabilistic logic programs. In: Godo, L., Pugliese, A. (eds.) SUM 2009. LNCS, vol. 5785, pp. 206–222. Springer, Heidelberg (2009)
Chapter Google Scholar
Saad, E.: A logical framework to reinforcement learning using hybrid probabilistic logic programs. In: 2nd International Conference on Scalable Uncertainty Management (2008)
Google Scholar
Saad, E.: On the relationship between hybrid probabilistic logic programs and stochastic satisfiability. In: 2nd International Conference on Scalable Uncertainty Management (2008)
Google Scholar
Saad, E.: Probabilistic planning in hybrid probabilistic logic programs. In: 1st International Conference on Scalable Uncertainty Management (2007)
Google Scholar
Saad, E., Pontelli, E.: A new approach to hybrid probabilistic logic programs. Annals of Mathematics and Artificial Intelligence Journal 48(3-4), 187–243 (2006)
Article MathSciNet MATH Google Scholar
Scherl, R., Levesque, H.: The frame problem and knowledge producing actions. In: AAAI 1993 (1993)
Google Scholar
Son, T., Baral, C., Nam, T., McIlraith, S.: Domain-dependent knowledge in answer set planning. ACM Transactions on Computational Logic 7(4), 613–657 (2006)
Article MathSciNet MATH Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Gulf University for Science and Technology, Mishref, Kuwait
Emad Saad

Authors

Emad Saad
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

CRIL-CNRS, University of Artois, 62307, France
Salem Benferhat
Department of Mathematics, Towson University, 21252, Towson, MD, USA
John Grant

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saad, E. (2011). Learning to Act Optimally in Partially Observable Markov Decision Processes Using Hybrid Probabilistic Logic Programs. In: Benferhat, S., Grant, J. (eds) Scalable Uncertainty Management. SUM 2011. Lecture Notes in Computer Science(), vol 6929. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23963-2_39

Download citation

DOI: https://doi.org/10.1007/978-3-642-23963-2_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23962-5
Online ISBN: 978-3-642-23963-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics