Reinforcement Learning: Insights from Interesting Failures in Parameter Selection

  • Wolfgang Konen
  • Thomas Bartz–Beielstein
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5199)


We investigate reinforcement learning methods, namely the temporal difference learning TD(λ) algorithm, on game-learning tasks. Small modifications in algorithm setup and parameter choice can have significant impact on success or failure to learn. We demonstrate that small differences in input features influence significantly the learning process. By selecting the right feature set we found good results within only 1/100 of the learning steps reported in the literature. Different metrics for measuring success in a reproducible manner are developed. We discuss why linear output functions are often preferable compared to sigmoid output functions.


Hide Neuron Strategic Game Learning Agent Board Position Reinforcement Learning Agent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sutton, R.S.: Learning to predict by the method of temporal differences. Machine Learning 3, 9–44 (1988)Google Scholar
  2. 2.
    Tesauro, G.: TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6, 215–219 (1994)CrossRefGoogle Scholar
  3. 3.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  4. 4.
    Stenmark, M.: Synthesizing board evaluation functions for connect4 using machine learning techniques. Master’s thesis, Østfold University College, Norway (2005)Google Scholar
  5. 5.
    Sutton, R.S.: Reinforcement learning FAQ (2008), Cited 20.4.2008,
  6. 6.
    Togelius, J., Gomez, F., Schmidhuber, J.: Learning what to ignore: Memetic climbing in weight and topology space. Congress on Evolutionary Computation (to appear, 2008)Google Scholar
  7. 7.
    Levkovich, C.: Temporal difference learning project (2008), Cited 10.3.2008,
  8. 8.
    Bartz-Beielstein, T.: Experimental Research in Evolutionary Computation—The New Experimentalism. Natural Computing Series. Springer, Heidelberg (2006)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Wolfgang Konen
    • 1
  • Thomas Bartz–Beielstein
    • 1
  1. 1.Faculty for Computer Science and Engineering ScienceCologne University of Applied SciencesGummersbachGermany

Personalised recommendations