Logical Markov Decision Programs and the Convergence of Logical TD(λ)

Kersting, Kristian; De Raedt, Luc

doi:10.1007/978-3-540-30109-7_16

Kristian Kersting²¹ &
Luc De Raedt²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3194))

Included in the following conference series:

International Conference on Inductive Logic Programming

291 Accesses
13 Citations

Abstract

Recent developments in the area of relational reinforcement learning (RRL) have resulted in a number of new algorithms. A theory, however, that explains why RRL works, seems to be lacking. In this paper, we provide some initial results on a theory of RRL. To realize this, we introduce a novel representation formalism, called logical Markov decision programs (LOMDPs), that integrates Markov Decision Processes (MDPs) with Logic Programs. Using LOMDPs one can compactly and declaratively represent complex MDPs. Within this framework we then devise a relational upgrade of TD(λ) called logical TD(λ) and prove convergence. Experiments validate our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Andre, D., Russell, S.: Programmable reinforcement learning agents. In: Advances in Neural Information Processing Systems, vol. 13, pp. 1019–1025. MIT Press, Cambridge (2001)
Google Scholar
Baum, E.B.: Towards a Model of Intelligence as an Economy of Agents. Machine Learning 35(2), 155–185 (1999)
Article MATH Google Scholar
Boutilier, C., Deam, T., Hanks, S.: Decision-Theoretic Planning: Structural Assumptions and Computational Leverage. JAIR 11, 1–94 (1999)
MATH Google Scholar
Boutilier, C., Reiter, R., Price, B.: Symbolic Dynamic Programming for Firstorder MDPs. In: Seventeenth International Joint Conference on Artificial Intelligence (IJCAI 2001) Seattle, USA, pp. 690–700 (2001)
Google Scholar
Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: safely approximating the value function. In: Advances in Neural Information Processing Systems, vol. 7 (1995)
Google Scholar
De Raedt, L., Kersting, K.: Probabilistic Logic Learning. ACM-SIGKDD Explorations: Special issue on Multi-Relational Data Mining 5(1), 31–48 (2003)
Google Scholar
Dearden, R., Boutilier, C.: Abstraction and approximate decision theoretic planning. Artificial Intelligence 89(1), 219–283 (1997)
Article MATH MathSciNet Google Scholar
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
MATH MathSciNet Google Scholar
Driessens, K., Ramon, J.: Relational Instance Based Regression for Relational Reinforcement Learning. In: Proceedings of the Twelfth International Conference on Machine Learning, Washington DC, USA, pp. 123–130 (2003)
Google Scholar
Ďzeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43(1/2), 7–52 (2001)
Article MATH Google Scholar
Ďzeroski, S., Lavrač, N.: Relational Data Mining. Springer, Heidelberg (2001)
MATH Google Scholar
Fern, A., Yoon, S., Givan, R.: Approximate policy iteration with a policy language bias. In: Proceedings of the Neural Information Processing Conference, NIPS (2003)
Google Scholar
Finney, S., Gardiol, N.H., Kaelbling, L.P., Oates, T.: The thing that we tried didn’t work very well: Deictic representation in reinforcement learning. In: Proceedings of the Eighteenth International Conference on Uncertainty in Artificial Intelligence, UAI 2002 (2002)
Google Scholar
Flach, P.: Simply logical: intelligent reasoning by example. John Wiley and Sons, Chichester (1994)
MATH Google Scholar
Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of the Sixteenth International Joint Conferences on Artificial Intelligence (IJCAI 1999), Stockholm, Sweden, pp. 1300–1309. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Givan, R., Dean, T., Greig, M.: Equivalence notions and model minimization in Markov decision processes. Artificial Intelligence 147, 163–224 (2003)
MATH MathSciNet Google Scholar
Gordon, G.J.: Stable fitted reinforcement learning. In: Advances in Neural Information Processing, pp. 1052–1058. MIT Press, Cambridge (1996)
Google Scholar
Guestrin, C., Koller, D., Gearhart, C., Kanodia, N.: Generalizing Plans to New Environments in Relational MDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico (2003)
Google Scholar
Kaelbling, L.P., Oates, T., Gardiol, N.H., Finney, S.: Learning in worlds with objects. In: Working Notes of the AAAI Stanford Spring Symposium on Learning Grounded Representations (2001)
Google Scholar
Kersting, K., De Raedt, L.: Logical markov decision programs. In: Working Notes of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (SRL 2003), pp. 63–70 (2003)
Google Scholar
Kersting, K., Van Otterlo, M., De Raedt, L.: Bellman goes Relational. In: Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), Banff, Alberta, Canada , July 4-8 (2004) (to appear)
Google Scholar
Kim, K.-E., Dean, T.: Solving factored mdps using non-homogeneous partitions. Artificial Intelligence 147, 225–251 (2003)
MATH MathSciNet Google Scholar
McCallum, K.: Reinforcement Learning with Selective Perception and Hidden States. PhD thesis, Department of Computer Science, University of Rochester (1995)
Google Scholar
Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. Journal of Logic Programming 19(20), 629–679 (1994)
Article MathSciNet Google Scholar
Munos, R., Moore, A.: Influence and Variance of a Markov Chain: Application to Adaptive Discretization in Optimal Control. In: Proceedings of the IEEE Conference on Decision and Control (1999)
Google Scholar
Poole, D.: The independent choice logic for modelling multiple agents under uncertainty. Artificial Intelligence 94(1–2), 7–56 (1997)
Article MATH MathSciNet Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Chichester (1994)
MATH Google Scholar
Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Advances in Neural Information Processing 7, pp. 361–268. MIT Press, Cambridge (1994)
Google Scholar
Slaney, J., Thiébaux, S.: Blocks World revisited. Artificial Intelligence 125, 119–153 (2001)
Article MATH MathSciNet Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)
Article MATH MathSciNet Google Scholar
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Transactions of Automatic Control 42, 674–690 (1997)
Article MATH Google Scholar
Van Otterlo, M.: Reinforcement Learning for Relational MDPs. In: Proceedings of the Annual Machine Learning Conference of Belgium and the Netherlands (2004)
Google Scholar
Whitehead, S.D., Ballard, D.H.: Learning to perceive and act by trial and error. Machine Learning 7(1), 45–83 (1991)
Google Scholar
Yoon, S., Fern, A., Givan, R.: Inductive policy selection for first-order MDPs. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence, UAI (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Computer Science, Machine Learning Lab, Albert-Ludwigs-University, Georges-Köhler-Allee, Gebäude 079, D-79110, Freiburg i. Brg., Germany
Kristian Kersting & Luc De Raedt

Authors

Kristian Kersting
View author publications
You can also search for this author in PubMed Google Scholar
Luc De Raedt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculdade de Engenharia & LIAAD, Universidade do Porto, Portugal
Rui Camacho
Department of Computer Science, Penglais, Aberystwyth, Ceredigion, University of Wales, SY23 3DB, Wales, UK
Ross King
Dept. of Computer Science and Engineering & Centre for Health Informatics, University of New South Wales, Sydney
Ashwin Srinivasan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kersting, K., De Raedt, L. (2004). Logical Markov Decision Programs and the Convergence of Logical TD(λ). In: Camacho, R., King, R., Srinivasan, A. (eds) Inductive Logic Programming. ILP 2004. Lecture Notes in Computer Science(), vol 3194. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30109-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-540-30109-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22941-4
Online ISBN: 978-3-540-30109-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics