Abstract
Recent developments in the area of relational reinforcement learning (RRL) have resulted in a number of new algorithms. A theory, however, that explains why RRL works, seems to be lacking. In this paper, we provide some initial results on a theory of RRL. To realize this, we introduce a novel representation formalism, called logical Markov decision programs (LOMDPs), that integrates Markov Decision Processes (MDPs) with Logic Programs. Using LOMDPs one can compactly and declaratively represent complex MDPs. Within this framework we then devise a relational upgrade of TD(λ) called logical TD(λ) and prove convergence. Experiments validate our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andre, D., Russell, S.: Programmable reinforcement learning agents. In: Advances in Neural Information Processing Systems, vol. 13, pp. 1019–1025. MIT Press, Cambridge (2001)
Baum, E.B.: Towards a Model of Intelligence as an Economy of Agents. Machine Learning 35(2), 155–185 (1999)
Boutilier, C., Deam, T., Hanks, S.: Decision-Theoretic Planning: Structural Assumptions and Computational Leverage. JAIR 11, 1–94 (1999)
Boutilier, C., Reiter, R., Price, B.: Symbolic Dynamic Programming for Firstorder MDPs. In: Seventeenth International Joint Conference on Artificial Intelligence (IJCAI 2001) Seattle, USA, pp. 690–700 (2001)
Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: safely approximating the value function. In: Advances in Neural Information Processing Systems, vol. 7 (1995)
De Raedt, L., Kersting, K.: Probabilistic Logic Learning. ACM-SIGKDD Explorations: Special issue on Multi-Relational Data Mining 5(1), 31–48 (2003)
Dearden, R., Boutilier, C.: Abstraction and approximate decision theoretic planning. Artificial Intelligence 89(1), 219–283 (1997)
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
Driessens, K., Ramon, J.: Relational Instance Based Regression for Relational Reinforcement Learning. In: Proceedings of the Twelfth International Conference on Machine Learning, Washington DC, USA, pp. 123–130 (2003)
Ďzeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43(1/2), 7–52 (2001)
Ďzeroski, S., Lavrač, N.: Relational Data Mining. Springer, Heidelberg (2001)
Fern, A., Yoon, S., Givan, R.: Approximate policy iteration with a policy language bias. In: Proceedings of the Neural Information Processing Conference, NIPS (2003)
Finney, S., Gardiol, N.H., Kaelbling, L.P., Oates, T.: The thing that we tried didn’t work very well: Deictic representation in reinforcement learning. In: Proceedings of the Eighteenth International Conference on Uncertainty in Artificial Intelligence, UAI 2002 (2002)
Flach, P.: Simply logical: intelligent reasoning by example. John Wiley and Sons, Chichester (1994)
Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of the Sixteenth International Joint Conferences on Artificial Intelligence (IJCAI 1999), Stockholm, Sweden, pp. 1300–1309. Morgan Kaufmann, San Francisco (1999)
Givan, R., Dean, T., Greig, M.: Equivalence notions and model minimization in Markov decision processes. Artificial Intelligence 147, 163–224 (2003)
Gordon, G.J.: Stable fitted reinforcement learning. In: Advances in Neural Information Processing, pp. 1052–1058. MIT Press, Cambridge (1996)
Guestrin, C., Koller, D., Gearhart, C., Kanodia, N.: Generalizing Plans to New Environments in Relational MDPs. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico (2003)
Kaelbling, L.P., Oates, T., Gardiol, N.H., Finney, S.: Learning in worlds with objects. In: Working Notes of the AAAI Stanford Spring Symposium on Learning Grounded Representations (2001)
Kersting, K., De Raedt, L.: Logical markov decision programs. In: Working Notes of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data (SRL 2003), pp. 63–70 (2003)
Kersting, K., Van Otterlo, M., De Raedt, L.: Bellman goes Relational. In: Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), Banff, Alberta, Canada , July 4-8 (2004) (to appear)
Kim, K.-E., Dean, T.: Solving factored mdps using non-homogeneous partitions. Artificial Intelligence 147, 225–251 (2003)
McCallum, K.: Reinforcement Learning with Selective Perception and Hidden States. PhD thesis, Department of Computer Science, University of Rochester (1995)
Muggleton, S., De Raedt, L.: Inductive logic programming: Theory and methods. Journal of Logic Programming 19(20), 629–679 (1994)
Munos, R., Moore, A.: Influence and Variance of a Markov Chain: Application to Adaptive Discretization in Optimal Control. In: Proceedings of the IEEE Conference on Decision and Control (1999)
Poole, D.: The independent choice logic for modelling multiple agents under uncertainty. Artificial Intelligence 94(1–2), 7–56 (1997)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Chichester (1994)
Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Advances in Neural Information Processing 7, pp. 361–268. MIT Press, Cambridge (1994)
Slaney, J., Thiébaux, S.: Blocks World revisited. Artificial Intelligence 125, 119–153 (2001)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)
Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Transactions of Automatic Control 42, 674–690 (1997)
Van Otterlo, M.: Reinforcement Learning for Relational MDPs. In: Proceedings of the Annual Machine Learning Conference of Belgium and the Netherlands (2004)
Whitehead, S.D., Ballard, D.H.: Learning to perceive and act by trial and error. Machine Learning 7(1), 45–83 (1991)
Yoon, S., Fern, A., Givan, R.: Inductive policy selection for first-order MDPs. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence, UAI (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kersting, K., De Raedt, L. (2004). Logical Markov Decision Programs and the Convergence of Logical TD(λ). In: Camacho, R., King, R., Srinivasan, A. (eds) Inductive Logic Programming. ILP 2004. Lecture Notes in Computer Science(), vol 3194. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30109-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-30109-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22941-4
Online ISBN: 978-3-540-30109-7
eBook Packages: Springer Book Archive