Skip to main content

Beyond Reward: The Problem of Knowledge and Data

  • Conference paper
Inductive Logic Programming (ILP 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7207))

Included in the following conference series:

Abstract

Intelligence can be defined, informally, as knowing a lot and being able to use that knowledge flexibly to achieve one’s goals. In this sense it is clear that knowledge is central to intelligence. However, it is less clear exactly what knowledge is, what gives it meaning, and how it can be efficiently acquired and used. In this talk we re-examine aspects of these age-old questions in light of modern experience (and particularly in light of recent work in reinforcement learning). Such questions are not just of philosophical or theoretical import; they directly effect the practicality of modern knowledge-based systems, which tend to become unwieldy and brittle—difficult to change—as the knowledge base becomes large and diverse.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Baird, L.C.: Residual algorithms: Reinforcement learning with function approximation. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 30–37 (1995)

    Google Scholar 

  • Degris, T., Modayil, J.: Scaling-up knowledge for a cognizant robot. In: Notes of the AAAI Spring Symposium on Designing Intelligent Robots: Reintegrating AI (2012)

    Google Scholar 

  • Konidaris, G., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 895–900 (2007)

    Google Scholar 

  • Koop, A.: Investigating Experience: Temporal Coherence and Empirical Knowledge Representation. MSc. thesis, University of Alberta (2007)

    Google Scholar 

  • Maei, H.R.: Gradient Temporal-Difference Learning Algorithms. PhD. thesis, University of Alberta (2011)

    Google Scholar 

  • Maei, H.R., Sutton, R.S.: GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In: Proceedings of the Third Conference on Artificial General Intelligence (2010)

    Google Scholar 

  • Maei, H.R., Szepesvári, C., Bhatnagar, S., Precup, D., Silver, D., Sutton, R.S.: Convergent temporal-difference learning with arbitrary smooth function approximation. In: Advances in Neural Information Processing Systems, vol. 22. MIT Press (2009)

    Google Scholar 

  • Maei, H.R., Szepesvári, C., Bhatnagar, S., Sutton, R.S.: Toward off-policy learning control with function approximation. In: Proceedings of the 27th International Conference on Machine Learning (2010)

    Google Scholar 

  • Mannor, S., Menache, I., Hoze, A., Klein, U.: Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)

    Google Scholar 

  • McGovern, A., Sutton, R.S.: Macro-actions in reinforcement learning: An empirical analysis. Technical Report 98-70, University of Massachusetts, Department of Computer Science (1998)

    Google Scholar 

  • Modayil, J., White, A., Sutton, R.S.: Multi-timescale nexting in a reinforcement learning robot. In: Proceedings of the 2012 Conference on Simulation of Adaptive Behaviour (to appear, 2012)

    Google Scholar 

  • Parr, R.: Hierarchical Control and Learning for Markov Decision Processes. PhD thesis, University of California at Berkeley (1998)

    Google Scholar 

  • Precup, D.: Temporal Abstraction in Reinforcement Learning. PhD thesis, University of Massachusetts (2000)

    Google Scholar 

  • Rafols, E.J.: Temporal Abstraction in Temporal-difference Networks. MSc. thesis, University of Alberta (2006)

    Google Scholar 

  • Singh, S., Barto, A.G., Chentanez, N.: Intrinsically motivated reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 17, pp. 1281–1288 (2005)

    Google Scholar 

  • Stolle, M., Precup, D.: Learning Options in Reinforcement Learning. In: Koenig, S., Holte, R.C. (eds.) SARA 2002. LNCS (LNAI), vol. 2371, pp. 212–223. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  • Sutton, R.S.: “Verification” and “Verfication, the key to AI” (2001), http://richsutton.com/IncIdeas/Verification.html , http://richsutton.com/IncIdeas/KeytoAI.html

  • Sutton, R.S.: The grand challenge of predictive empirical abstract knowledge. In: Working Notes of the IJCAI 2009 Workshop on Grand Challenges for Reasoning from Experiences (2009)

    Google Scholar 

  • Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)

    Google Scholar 

  • Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th International Conference on Machine Learning (2009)

    Google Scholar 

  • Sutton, R.S., Modayil, J., Delp, M., Degris, T., Pilarski, P.M., White, A., Precup, D.: Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In: Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems, AAMAS (2011)

    Google Scholar 

  • Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  • Sutton, R.S., Szepesvári, C., Maei, H.R.: A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation. In: Advances in Neural Information Processing Systems, vol. 21. MIT Press (2009)

    Google Scholar 

  • Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42, 674–690 (1997)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sutton, R.S. (2012). Beyond Reward: The Problem of Knowledge and Data. In: Muggleton, S.H., Tamaddoni-Nezhad, A., Lisi, F.A. (eds) Inductive Logic Programming. ILP 2011. Lecture Notes in Computer Science(), vol 7207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31951-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31951-8_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31950-1

  • Online ISBN: 978-3-642-31951-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics