Skip to main content

Logical Markov Decision Programs and the Convergence of Logical TD(λ)

  • Conference paper
Inductive Logic Programming (ILP 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3194))

Included in the following conference series:

Abstract

Recent developments in the area of relational reinforcement learning (RRL) have resulted in a number of new algorithms. A theory, however, that explains why RRL works, seems to be lacking. In this paper, we provide some initial results on a theory of RRL. To realize this, we introduce a novel representation formalism, called logical Markov decision programs (LOMDPs), that integrates Markov Decision Processes (MDPs) with Logic Programs. Using LOMDPs one can compactly and declaratively represent complex MDPs. Within this framework we then devise a relational upgrade of TD(λ) called logical TD(λ) and prove convergence. Experiments validate our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andre, D., Russell, S.: Programmable reinforcement learning agents. In: Advances in Neural Information Processing Systems, vol. 13, pp. 1019–1025. MIT Press, Cambridge (2001)

    Google Scholar 

  2. Baum, E.B.: Towards a Model of Intelligence as an Economy of Agents. Machine Learning 35(2), 155–185 (1999)

    Article  MATH  Google Scholar 

  3. Boutilier, C., Deam, T., Hanks, S.: Decision-Theoretic Planning: Structural Assumptions and Computational Leverage. JAIR 11, 1–94 (1999)

    MATH  Google Scholar 

  4. Boutilier, C., Reiter, R., Price, B.: Symbolic Dynamic Programming for Firstorder MDPs. In: Seventeenth International Joint Conference on Artificial Intelligence (IJCAI 2001) Seattle, USA, pp. 690–700 (2001)

    Google Scholar 

  5. Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: safely approximating the value function. In: Advances in Neural Information Processing Systems, vol. 7 (1995)

    Google Scholar 

  6. De Raedt, L., Kersting, K.: Probabilistic Logic Learning. ACM-SIGKDD Explorations: Special issue on Multi-Relational Data Mining 5(1), 31–48 (2003)

    Google Scholar 

  7. Dearden, R., Boutilier, C.: Abstraction and approximate decision theoretic planning. Artificial Intelligence 89(1), 219–283 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  8. Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)

    MATH  MathSciNet  Google Scholar 

  9. Driessens, K., Ramon, J.: Relational Instance Based Regression for Relational Reinforcement Learning. In: Proceedings of the Twelfth International Conference on Machine Learning, Washington DC, USA, pp. 123–130 (2003)

    Google Scholar 

  10. Ďzeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43(1/2), 7–52 (2001)

    Article  MATH  Google Scholar 

  11. Ďzeroski, S., Lavrač, N.: Relational Data Mining. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  12. Fern, A., Yoon, S., Givan, R.: Approximate policy iteration with a policy language bias. In: Proceedings of the Neural Information Processing Conference, NIPS (2003)

    Google Scholar 

  13. Finney, S., Gardiol, N.H., Kaelbling, L.P., Oates, T.: The thing that we tried didn’t work very well: Deictic representation in reinforcement learning. In: Proceedings of the Eighteenth International Conference on Uncertainty in Artificial Intelligence, UAI 2002 (2002)

    Google Scholar 

  14. Flach, P.: Simply logical: intelligent reasoning by example. John Wiley and Sons, Chichester (1994)

    MATH  Google Scholar 

  15. Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of the Sixteenth International Joint Conferences on Artificial Intelligence (IJCAI 1999), Stockholm, Sweden, pp. 1300–1309. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  16. Givan, R., Dean, T., Greig, M.: Equivalence notions and model minimization in Markov decision processes. Artificial Intelligence 147, 163–224 (2003)

    MATH  MathSciNet  Google Scholar 

  17. Gordon, G.J.: Stable fitted reinforcement learning. In: Advances in Neural Information Processing, pp. 1052–1058. MIT Press, Cambridge (1996)

    Google Scholar 

  18. Guestrin, C., Koller, D., Gearhart, C., Kanodia, N.: Generalizing Plans to New Environments in Relational MDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico (2003)

    Google Scholar 

  19. Kaelbling, L.P., Oates, T., Gardiol, N.H., Finney, S.: Learning in worlds with objects. In: Working Notes of the AAAI Stanford Spring Symposium on Learning Grounded Representations (2001)

    Google Scholar 

  20. Kersting, K., De Raedt, L.: Logical markov decision programs. In: Working Notes of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (SRL 2003), pp. 63–70 (2003)

    Google Scholar 

  21. Kersting, K., Van Otterlo, M., De Raedt, L.: Bellman goes Relational. In: Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), Banff, Alberta, Canada , July 4-8 (2004) (to appear)

    Google Scholar 

  22. Kim, K.-E., Dean, T.: Solving factored mdps using non-homogeneous partitions. Artificial Intelligence 147, 225–251 (2003)

    MATH  MathSciNet  Google Scholar 

  23. McCallum, K.: Reinforcement Learning with Selective Perception and Hidden States. PhD thesis, Department of Computer Science, University of Rochester (1995)

    Google Scholar 

  24. Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. Journal of Logic Programming 19(20), 629–679 (1994)

    Article  MathSciNet  Google Scholar 

  25. Munos, R., Moore, A.: Influence and Variance of a Markov Chain: Application to Adaptive Discretization in Optimal Control. In: Proceedings of the IEEE Conference on Decision and Control (1999)

    Google Scholar 

  26. Poole, D.: The independent choice logic for modelling multiple agents under uncertainty. Artificial Intelligence 94(1–2), 7–56 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  27. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Chichester (1994)

    MATH  Google Scholar 

  28. Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Advances in Neural Information Processing 7, pp. 361–268. MIT Press, Cambridge (1994)

    Google Scholar 

  29. Slaney, J., Thiébaux, S.: Blocks World revisited. Artificial Intelligence 125, 119–153 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  30. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  31. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  32. Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Transactions of Automatic Control 42, 674–690 (1997)

    Article  MATH  Google Scholar 

  33. Van Otterlo, M.: Reinforcement Learning for Relational MDPs. In: Proceedings of the Annual Machine Learning Conference of Belgium and the Netherlands (2004)

    Google Scholar 

  34. Whitehead, S.D., Ballard, D.H.: Learning to perceive and act by trial and error. Machine Learning 7(1), 45–83 (1991)

    Google Scholar 

  35. Yoon, S., Fern, A., Givan, R.: Inductive policy selection for first-order MDPs. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence, UAI (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kersting, K., De Raedt, L. (2004). Logical Markov Decision Programs and the Convergence of Logical TD(λ). In: Camacho, R., King, R., Srinivasan, A. (eds) Inductive Logic Programming. ILP 2004. Lecture Notes in Computer Science(), vol 3194. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30109-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30109-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22941-4

  • Online ISBN: 978-3-540-30109-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics