Advertisement

Beyond Reward: The Problem of Knowledge and Data

  • Richard S. Sutton
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7207)

Abstract

Intelligence can be defined, informally, as knowing a lot and being able to use that knowledge flexibly to achieve one’s goals. In this sense it is clear that knowledge is central to intelligence. However, it is less clear exactly what knowledge is, what gives it meaning, and how it can be efficiently acquired and used. In this talk we re-examine aspects of these age-old questions in light of modern experience (and particularly in light of recent work in reinforcement learning). Such questions are not just of philosophical or theoretical import; they directly effect the practicality of modern knowledge-based systems, which tend to become unwieldy and brittle—difficult to change—as the knowledge base becomes large and diverse.

Keywords

Reinforcement Learning Neural Information Processing System Reinforcement Learning Algorithm Temporal Abstraction Battery Charger 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baird, L.C.: Residual algorithms: Reinforcement learning with function approximation. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 30–37 (1995)Google Scholar
  2. Degris, T., Modayil, J.: Scaling-up knowledge for a cognizant robot. In: Notes of the AAAI Spring Symposium on Designing Intelligent Robots: Reintegrating AI (2012)Google Scholar
  3. Konidaris, G., Barto, A.G.: Building portable options: Skill transfer in reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 895–900 (2007)Google Scholar
  4. Koop, A.: Investigating Experience: Temporal Coherence and Empirical Knowledge Representation. MSc. thesis, University of Alberta (2007)Google Scholar
  5. Maei, H.R.: Gradient Temporal-Difference Learning Algorithms. PhD. thesis, University of Alberta (2011)Google Scholar
  6. Maei, H.R., Sutton, R.S.: GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In: Proceedings of the Third Conference on Artificial General Intelligence (2010)Google Scholar
  7. Maei, H.R., Szepesvári, C., Bhatnagar, S., Precup, D., Silver, D., Sutton, R.S.: Convergent temporal-difference learning with arbitrary smooth function approximation. In: Advances in Neural Information Processing Systems, vol. 22. MIT Press (2009)Google Scholar
  8. Maei, H.R., Szepesvári, C., Bhatnagar, S., Sutton, R.S.: Toward off-policy learning control with function approximation. In: Proceedings of the 27th International Conference on Machine Learning (2010)Google Scholar
  9. Mannor, S., Menache, I., Hoze, A., Klein, U.: Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)Google Scholar
  10. McGovern, A., Sutton, R.S.: Macro-actions in reinforcement learning: An empirical analysis. Technical Report 98-70, University of Massachusetts, Department of Computer Science (1998)Google Scholar
  11. Modayil, J., White, A., Sutton, R.S.: Multi-timescale nexting in a reinforcement learning robot. In: Proceedings of the 2012 Conference on Simulation of Adaptive Behaviour (to appear, 2012)Google Scholar
  12. Parr, R.: Hierarchical Control and Learning for Markov Decision Processes. PhD thesis, University of California at Berkeley (1998)Google Scholar
  13. Precup, D.: Temporal Abstraction in Reinforcement Learning. PhD thesis, University of Massachusetts (2000)Google Scholar
  14. Rafols, E.J.: Temporal Abstraction in Temporal-difference Networks. MSc. thesis, University of Alberta (2006)Google Scholar
  15. Singh, S., Barto, A.G., Chentanez, N.: Intrinsically motivated reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 17, pp. 1281–1288 (2005)Google Scholar
  16. Stolle, M., Precup, D.: Learning Options in Reinforcement Learning. In: Koenig, S., Holte, R.C. (eds.) SARA 2002. LNCS (LNAI), vol. 2371, pp. 212–223. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  17. Sutton, R.S.: “Verification” and “Verfication, the key to AI” (2001), http://richsutton.com/IncIdeas/Verification.html, http://richsutton.com/IncIdeas/KeytoAI.html
  18. Sutton, R.S.: The grand challenge of predictive empirical abstract knowledge. In: Working Notes of the IJCAI 2009 Workshop on Grand Challenges for Reasoning from Experiences (2009)Google Scholar
  19. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)Google Scholar
  20. Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th International Conference on Machine Learning (2009)Google Scholar
  21. Sutton, R.S., Modayil, J., Delp, M., Degris, T., Pilarski, P.M., White, A., Precup, D.: Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In: Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems, AAMAS (2011)Google Scholar
  22. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112, 181–211 (1999)MathSciNetzbMATHCrossRefGoogle Scholar
  23. Sutton, R.S., Szepesvári, C., Maei, H.R.: A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation. In: Advances in Neural Information Processing Systems, vol. 21. MIT Press (2009)Google Scholar
  24. Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42, 674–690 (1997)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Richard S. Sutton
    • 1
  1. 1.University of AlbertaEdmontonCanada

Personalised recommendations