Skip to main content

Robustness Analysis of SARSA(λ): Different Models of Reward and Initialisation

  • Conference paper
Artificial Intelligence: Methodology, Systems, and Applications (AIMSA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5253))

  • 1039 Accesses

Abstract

In the paper the robustness of SARSA(λ), the reinforcement learning algorithm with eligibility traces, is confronted with different models of reward and initialisation of the Q-table. Most of the empirical analyses of eligibility traces in the literature have focused mainly on the step-penalty reward. We analyse two general types of rewards (final goal and step-penalty rewards) and show that learning with long traces, i.e., with high values of λ, can lead to suboptimal solutions in some situations. Problems are identified and discussed. Specifically, obtained results show that SARSA(λ) is sensitive to different models of reward and initialisation. In some cases the asymptotic performance can be significantly reduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York (1994)

    MATH  Google Scholar 

  2. Sutton, R.S.: Temporal credit assignment in reinforcement learning. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst (1984)

    Google Scholar 

  3. Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov Decision Processes. In: Proceedings of the 15th International Conferrence on Machine Learning, pp. 323–331 (1998)

    Google Scholar 

  4. Sutton, R.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)

    Google Scholar 

  5. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  6. Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Machine Learning 22(1–3), 123–158 (1996)

    MATH  Google Scholar 

  7. Peng, J., Williams, R.J.: Incremental multi-step Q-learning. Machine Learning 22, 283–290 (1996)

    Google Scholar 

  8. Cichosz, P.: Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning. Journal of Artificial Intelligence Research 2, 287–318 (1995)

    Google Scholar 

  9. Tesauro, G.: Practical issues in temporal difference learning. Machine Learning 8, 257–277 (1992)

    MATH  Google Scholar 

  10. Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the 7th International Conference on Machine Learning, pp. 216–224 (1990)

    Google Scholar 

  11. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2002)

    Google Scholar 

  12. Jong, N.K., Stone, P.: Model-based exploration in continuous state spaces. In: The 7th Symposium on Abstraction, Reformulation, and Approximation (2007)

    Google Scholar 

  13. Crites, R.H., Barto, A.G.: Improving elevator performance using reinforcement learning. Advances in Neural Information Processing Systems 8, 1017–1023 (1996)

    Google Scholar 

  14. Stone, P., Sutton, R.S., Kuhlmann, G.: Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior 13(3), 165–188 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Danail Dochev Marco Pistore Paolo Traverso

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grześ, M., Kudenko, D. (2008). Robustness Analysis of SARSA(λ): Different Models of Reward and Initialisation. In: Dochev, D., Pistore, M., Traverso, P. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2008. Lecture Notes in Computer Science(), vol 5253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85776-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85776-1_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85775-4

  • Online ISBN: 978-3-540-85776-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics