Robustness Analysis of SARSA(λ): Different Models of Reward and Initialisation

Grześ, Marek; Kudenko, Daniel

doi:10.1007/978-3-540-85776-1_13

Marek Grześ¹ &
Daniel Kudenko¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5253))

Included in the following conference series:

International Conference on Artificial Intelligence: Methodology, Systems, and Applications

1039 Accesses

Abstract

In the paper the robustness of SARSA(λ), the reinforcement learning algorithm with eligibility traces, is confronted with different models of reward and initialisation of the Q-table. Most of the empirical analyses of eligibility traces in the literature have focused mainly on the step-penalty reward. We analyse two general types of rewards (final goal and step-penalty rewards) and show that learning with long traces, i.e., with high values of λ, can lead to suboptimal solutions in some situations. Problems are identified and discussed. Specifically, obtained results show that SARSA(λ) is sensitive to different models of reward and initialisation. In some cases the asymptotic performance can be significantly reduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York (1994)
MATH Google Scholar
Sutton, R.S.: Temporal credit assignment in reinforcement learning. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst (1984)
Google Scholar
Loch, J., Singh, S.: Using eligibility traces to find the best memoryless policy in partially observable Markov Decision Processes. In: Proceedings of the 15th International Conferrence on Machine Learning, pp. 323–331 (1998)
Google Scholar
Sutton, R.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Machine Learning 22(1–3), 123–158 (1996)
MATH Google Scholar
Peng, J., Williams, R.J.: Incremental multi-step Q-learning. Machine Learning 22, 283–290 (1996)
Google Scholar
Cichosz, P.: Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning. Journal of Artificial Intelligence Research 2, 287–318 (1995)
Google Scholar
Tesauro, G.: Practical issues in temporal difference learning. Machine Learning 8, 257–277 (1992)
MATH Google Scholar
Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the 7th International Conference on Machine Learning, pp. 216–224 (1990)
Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2002)
Google Scholar
Jong, N.K., Stone, P.: Model-based exploration in continuous state spaces. In: The 7th Symposium on Abstraction, Reformulation, and Approximation (2007)
Google Scholar
Crites, R.H., Barto, A.G.: Improving elevator performance using reinforcement learning. Advances in Neural Information Processing Systems 8, 1017–1023 (1996)
Google Scholar
Stone, P., Sutton, R.S., Kuhlmann, G.: Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior 13(3), 165–188 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of York, York, YO10 5DD, UK
Marek Grześ & Daniel Kudenko

Authors

Marek Grześ
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Kudenko
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Danail Dochev Marco Pistore Paolo Traverso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grześ, M., Kudenko, D. (2008). Robustness Analysis of SARSA(λ): Different Models of Reward and Initialisation. In: Dochev, D., Pistore, M., Traverso, P. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2008. Lecture Notes in Computer Science(), vol 5253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85776-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-85776-1_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85775-4
Online ISBN: 978-3-540-85776-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics