Skip to main content

Q-Learning with Linear Function Approximation

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4539))

Abstract

In this paper, we analyze the convergence of Q-learning with linear function approximation. We identify a set of conditions that implies the convergence of this method with probability 1, when a fixed learning policy is used. We discuss the differences and similarities between our results and those obtained in several related works. We also discuss the applicability of this method when a changing policy is used. Finally, we describe the applicability of this approximate method in partially observable scenarios.

This work was partially supported by Programa Operacional Sociedade do Conhecimento (POS_C) that includes FEDER funds.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sutton, R.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)

    Google Scholar 

  2. Watkins, C.: Learning from delayed rewards. PhD thesis, King’s College, University of Cambridge (May 1989)

    Google Scholar 

  3. Rummery, G., Niranjan, M.: On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department (1994)

    Google Scholar 

  4. Sutton, R.: DYNA, an integrated architecture for learning, planning, and reacting. ACM SIGART Bulletin 2(4), 160–163 (1991)

    Article  Google Scholar 

  5. Barto, A., Bradtke, S., Singh, S.: Learning to act using real-time dynamic programming. Technical Report UM-CS-1993-002, Department of Computer Science, University of Massachusetts at Amherst (1993)

    Google Scholar 

  6. Boyan, J.: Least-squares temporal difference learning. In: Proc. 16th Int. Conf. Machine Learning, 49–56 (1999)

    Google Scholar 

  7. Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific (1996)

    Google Scholar 

  8. Sutton, R.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems 8, 1038–1044 (1996)

    Google Scholar 

  9. Boyan, J., Moore, A.: Generalization in reinforcement learning: Safely approximating the value function. Advances in Neural Information Processing Systems 7, 369–376 (1994)

    Google Scholar 

  10. Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6(2), 215–219 (1994)

    Article  Google Scholar 

  11. Singh, S., Jaakkola, T., Jordan, M.: Reinforcement learning with soft state aggregation. Advances in Neural Information Processing Systems 7, 361–368 (1994)

    Google Scholar 

  12. Gordon, G.: Stable function approximation in dynamic programming. Technical Report CMU-CS-95-103, School of Computer Science, Carnegie Mellon University (1995)

    Google Scholar 

  13. Tsitsiklis, J., Van Roy, B.: Feature-based methods for large scale dynamic programming. Machine Learning 22, 59–94 (1996)

    MATH  Google Scholar 

  14. Precup, D., Sutton, R., Dasgupta, S.: Off-policy temporal-difference learning with function approximation. In: Proc. 18th Int. Conf. Machine Learning, 417–424 (2001)

    Google Scholar 

  15. Szepesvári, C., Smart, W.: Interpolation-based Q-learning. In: Proc. 21st Int. Conf. Machine learning, 100–107 (2004)

    Google Scholar 

  16. Tsitsiklis, J., Van Roy, B.: An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control AC-42(5), 674–690 (1996)

    Google Scholar 

  17. Borkar, V.: A learning algorithm for discrete-time stochastic control. Probability in the Engineering and Informational Sciences 14, 243–258 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  18. Melo, F., Ribeiro, M.I.: Q-learning with linear function approximation. Technical Report RT-602-07, Institute for Systems and Robotics (March 2007)

    Google Scholar 

  19. Watkins, C., Dayan, P.: Technical note: Q-learning. Machine Learning 8, 279–292 (1992)

    MATH  Google Scholar 

  20. Meyn, S., Tweedie, R.: Markov Chains and Stochastic Stability. Springer, Heidelberg (1993)

    MATH  Google Scholar 

  21. Baird, L.: Residual algorithms: Reinforcement learning with function approximation. In: Proc. 12th Int. Conf. Machine Learning, 30–37 (1995)

    Google Scholar 

  22. Bertsekas, D., Borkar, V., Nedić, A.: 9. In: Improved temporal difference methods with linear function approximation. Wiley Publishers, 235–260 (2004)

    Google Scholar 

  23. Baker, W.: Learning via stochastic approximation in function space. PhD Thesis (1997)

    Google Scholar 

  24. Lusena, C., Goldsmith, J., Mundhenk, M.: Nonapproximability results for partially observable Markov decision processes. J. Artificial Intelligence Research 14, 83–103 (2001)

    MATH  MathSciNet  Google Scholar 

  25. Papadimitriou, C., Tsitsiklis, J.: The complexity of Markov chain decision processes. Mathematics of Operations Research 12(3), 441–450 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  26. Cassandra, A.: Exact and approximate algorithms for partially observable Markov decision processes. PhD thesis, Brown University (May 1998)

    Google Scholar 

  27. Aberdeen, D.: A (revised) survey of approximate methods for solving partially observable Markov decision processes. Technical report, National ICT Australia, Canberra, Australia (2003)

    Google Scholar 

  28. Littman, M., Cassandra, A., Kaelbling, L.: Learning policies for partially observable environments: Scaling up. In: Proc. 12th Int. Conf. Machine Learning, 362–370 (1995)

    Google Scholar 

  29. Parr, R., Russell, S.: Approximating optimal policies for partially observable stochastic domains. In: Proc. Int. Joint Conf. Artificial Intelligence, 1088–1094 (1995)

    Google Scholar 

  30. He, Q., Shayman, M.: Solving POMDPs by on-policy linear approximate learning algorithm. In: Proc. Conf. Information Sciences and Systems (2000)

    Google Scholar 

  31. Glaubius, R., Smart, W.: Manifold representations for value-function approximation in reinforcement learning. Technical Report 05-19, Department of Computer Science and Engineering, Washington University in St. Louis (2005)

    Google Scholar 

  32. Keller, P., Mannor, S., Precup, D.: Automatic basis function construction for approximate dynamic programming and reinforcement learning. In: Proc. 23rd Int. Conf. Machine Learning, 449–456 (2006)

    Google Scholar 

  33. Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134(1), 215–238 (2005)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Nader H. Bshouty Claudio Gentile

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Melo, F.S., Ribeiro, M.I. (2007). Q-Learning with Linear Function Approximation. In: Bshouty, N.H., Gentile, C. (eds) Learning Theory. COLT 2007. Lecture Notes in Computer Science(), vol 4539. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72927-3_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72927-3_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72925-9

  • Online ISBN: 978-3-540-72927-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics