Building Relational World Models for Reinforcement Learning

Walker, Trevor; Torrey, Lisa; Shavlik, Jude; Maclin, Richard

doi:10.1007/978-3-540-78469-2_27

Trevor Walker¹,
Lisa Torrey¹,
Jude Shavlik¹ &
…
Richard Maclin²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4894))

Included in the following conference series:

International Conference on Inductive Logic Programming

564 Accesses

Abstract

Many reinforcement learning domains are highly relational.Whiletraditional temporal-difference methods can be applied to these domains, they are limited in their capacity to exploit the relational nature of the domain. Our algorithm, AMBIL, constructs relational world models in the form of relational Markov decision processes (MDPs). AMBIL works backwards from collections of high-reward states, utilizing inductive logic programming to learn their preimage, logical definitions of the region of state space that leads to the high-reward states via some action. These learned preimages are chained together to form an MDP that abstractly represents the domain. AMBIL estimates the reward and transition probabilities of this MDP from past experience. Since our MDPs are small, AMBIL uses value-iteration to quickly estimate the Q-values of each action in the induced states and determine a policy. AMBIL is able to employ complex background knowledge and supports relational representations. Empirical evaluation on both synthetic domains and a sub-task of the RoboCup soccer domain shows significant performance gains compared to standard Q-learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton, New Jersey (1957)
Google Scholar
Blockeel, H., De Raedt, L.: Top-down induction of first-order locical decision trees. Artificial Intelligence (June 1998)
Google Scholar
Dietterich, T., Flann, N.: Explanation-based learning and reinforcement learning: A unified view. In: Proceedings of the International Conference on Machine Learning (1995)
Google Scholar
Driessens, K., Ramon, J., Blockeel, H.: Speeding up relational reinforcement learning through the use of an incremental first order decision tree algorithm. In: Proceedings of the European Conference on Machine Learning (2001)
Google Scholar
Džeroski, S., De Raedt, L., Blockeel, H.: Relational reinforcement learning. In: Proceedings of the International Conference on Machine Learning (1998)
Google Scholar
Lecoeuche, R.: Learning optimal dialog management rules by using reinforcement learning and inductive logic programming. In: Proceedings of the North American Chapter of the Association of Computational Linquistic (June 2001)
Google Scholar
Kersting, K., Van Otterlo, M., De Raedt, L.: Bellman goes relational. In: Proceedings of the International Conference on Machine Learning (2004)
Google Scholar
Maclin, R., Shavlik, J., Torrey, L., Walker, T., Wild, E.: Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression. In: Proceedings of the Twentieth Conference on Artificial Intelligence (2005)
Google Scholar
Morales, E.F.: Scaling up reinforcement learning with a relational representation. In: Proceedings of the Workshop on Adaptability in Multi-Agent Systems at AORC 2003 (2003)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Van Otterlo, M.: Efficient reinforcement learning using relational aggregation. In: Proceedings of the Sixth European Workshop on Reinforcement Learning (2003)
Google Scholar
Srinivasan, A.: The Aleph Manual (2001)
Google Scholar
Stone, P., Sutton, R.: Scaling reinforcement learning toward RoboCup soccer. In: Proceedings of the International Conference on Machine Learning (2001)
Google Scholar
Sutton, R.: Integrated modeling and control based on reinforcement learning and dynamic programming. In: Advances in Neural Information Processing Systems, vol. 3 (1991)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Watkins, C.J.C.H.: Learning from Delayed Rewards. Ph.D. thesis, Cambridge University (1989)
Google Scholar
Zettlemoyer, L.S., Pasula, H.M., Kaelbling, L.P.: Learning Planning Rules in Noisy Stochastic Worlds. In: Proceedings of the Twentieth Conference on Artificial Intelligence (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Wisconsin, Madison, WI, 53706, USA
Trevor Walker, Lisa Torrey & Jude Shavlik
University of Minnesota, Duluth, MN, 55812, USA
Richard Maclin

Authors

Trevor Walker
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Torrey
View author publications
You can also search for this author in PubMed Google Scholar
Jude Shavlik
View author publications
You can also search for this author in PubMed Google Scholar
Richard Maclin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hendrik Blockeel Jan Ramon Jude Shavlik Prasad Tadepalli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Walker, T., Torrey, L., Shavlik, J., Maclin, R. (2008). Building Relational World Models for Reinforcement Learning. In: Blockeel, H., Ramon, J., Shavlik, J., Tadepalli, P. (eds) Inductive Logic Programming. ILP 2007. Lecture Notes in Computer Science(), vol 4894. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78469-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-78469-2_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78468-5
Online ISBN: 978-3-540-78469-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics