Advertisement

Anticipatory Learning Classifier Systems and Factored Reinforcement Learning

  • Olivier Sigaud
  • Martin V. Butz
  • Olga Kozlova
  • Christophe Meyer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5499)

Abstract

Factored Reinforcement Learning (frl) is a new technique to solve Factored Markov Decision Problems (fmdps) when the structure of the problem is not known in advance. Like Anticipatory Learning Classifier Systems (alcss), it is a model-based Reinforcement Learning approach that includes generalization mechanisms in the presence of a structured domain. In general, frl and alcss are explicit, state-anticipatory approaches that learn generalized state transition models to improve system behavior based on model-based reinforcement learning techniques. In this contribution, we highlight the conceptual similarities and differences between frl and alcss, focusing on the one hand on spiti, an instance of frl method, and on alcss, macs and xacs, on the other hand. Though frl systems seem to benefit from a clearer theoretical grounding, an empirical comparison between spiti and xacs on two benchmark problems reveals that the latter scales much better than the former when some combination of state variables do not occur. Based on this finding, we discuss the mechanisms in xacs that result in the better scalability and propose importing these mechanisms into frl systems.

Keywords

Benchmark Problem Compact Model Generalization Mechanism Anticipatory Behavior Reinforcement Learning Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Butz, M.V., Sigaud, O., Gérard, P.: Anticipatory behavior: Exploiting knowledge about the future to improve current behavior. In: Butz, M.V., Sigaud, O., Gérard, P. (eds.) Anticipatory Behavior in Adaptive Learning Systems. LNCS, vol. 2684, pp. 1–10. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  2. 2.
    Butz, M.V.: Anticipatory Learning Classifier Systems. Kluwer Academic Publishers, Boston (2002)CrossRefzbMATHGoogle Scholar
  3. 3.
    Sutton, R.S.: Planning by incremental dynamic programming. In: Proceedings of the Eighth International Conference on Machine Learning, pp. 353–357. Morgan Kaufmann, San Mateo (1990)Google Scholar
  4. 4.
    Gérard, P., Sigaud, O.: Designing efficient exploration with MACS: Modules and function approximation. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2723, pp. 1882–1893. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: Proceedings of the 14th International Joint Conference in Artificial Intelligence, pp. 1104–1111 (1995)Google Scholar
  6. 6.
    Degris, T., Sigaud, O., Wuillemin, P.H.: Chi-square tests driven method for learning the structure of factored MDPs. In: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, Massachusetts Institute of Technology, Cambridge, pp. 122–129. AUAI Press (2006)Google Scholar
  7. 7.
    Degris, T., Sigaud, O., Wuillemin, P.H.: Learning the structure of factored markov decision processes in reinforcement learning problems. In: Proceedings of the 23rd International Conference in Machine Learning, pp. 257–264. ACM, Pittsburgh (2006)Google Scholar
  8. 8.
    Sigaud, O., Wilson, S.W.: Learning Classifier Systems: a survey. Journal of Soft Computing 11(11), 1065–1078 (2007)CrossRefzbMATHGoogle Scholar
  9. 9.
    Holland, J.H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. University of Michigan Press, Ann Arbor (1975)zbMATHGoogle Scholar
  10. 10.
    Wilson, S.W.: ZCS, a Zeroth level Classifier System. Evolutionary Computation 2(1), 1–18 (1994)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Wilson, S.W.: Classifier Fitness Based on Accuracy. Evolutionary Computation 3(2), 149–175 (1995)CrossRefGoogle Scholar
  12. 12.
    Riolo, R.L.: Lookahead planning and latent learning in a Classifier System. In: Meyer, J.A., Wilson, S.W. (eds.) From animals to animats: Proceedings of the First International Conference on Simulation of Adaptative Behavior, pp. 316–326. MIT Press, Cambridge (1991)Google Scholar
  13. 13.
    Holland, J.H., Reitman, J.S.: Cognitive Systems based on Adaptive Algorithms. Pattern Directed Inference Systems 7(2), 125–149 (1978)Google Scholar
  14. 14.
    Stolzmann, W.: Anticipatory Classifier Systems. In: Koza, J., Banzhaf, W., Chellapilla, K., Deb, K., Dorigo, M., Fogel, D.B., Garzon, M.H., Goldberg, D.E., Iba, H., Riolo, R. (eds.) Proceedings of the 1998 Genetic and Evolutionary Computation Conference, pp. 658–664. Morgan Kaufmann Publishers, Inc., San Francisco (1998)Google Scholar
  15. 15.
    Butz, M.V., Goldberg, D.E., Stolzmann, W.: Introducing a genetic generalization pressure to the Anticipatory Classifier Systems part I: Theoretical approach. In: Proceedings of the 2000 Genetic and Evolutionary Computation Conference (GECCO 2000), pp. 34–41 (2000)Google Scholar
  16. 16.
    Hoffmann, J.: Vorhersage und Erkenntnis [Anticipation and Cognition]. Hogrefe, Göttingen (1993)Google Scholar
  17. 17.
    Butz, M.V.: An Algorithmic Description of ACS2. In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2001. LNCS, vol. 2321, pp. 211–229. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  18. 18.
    Butz, M.V., Goldberg, D.E., Stolzmann, W.: The Anticipatory Classifier System and Genetic Generalization. Natural Computing 1(4), 427–467 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Butz, M.V., Goldberg, D.E.: Generalized state values in an anticipatory Learning Classifier System. In: Butz, M.V., Sigaud, O., Gérard, P. (eds.) Anticipatory Behavior in Adaptive Learning Systems. LNCS (LNAI), vol. 2684, pp. 282–301. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  20. 20.
    Gérard, P., Stolzmann, W., Sigaud, O.: YACS: a new Learning Classifier System with Anticipation. Journal of Soft Computing: Special Issue on Learning Classifier Systems 6(3-4), 216–228 (2002)CrossRefzbMATHGoogle Scholar
  21. 21.
    Gérard, P., Meyer, J.A., Sigaud, O.: Combining latent learning with dynamic programming in MACS. European Journal of Operational Research 160, 614–637 (2005)CrossRefzbMATHGoogle Scholar
  22. 22.
    Dean, T., Kanazawa, K.: A Model for Reasoning about Persistence and Causation. Computational Intelligence 5, 142–150 (1989)CrossRefGoogle Scholar
  23. 23.
    Boutilier, C., Dearden, R., Goldszmidt, M.: Stochastic dynamic programming with factored representations. Artificial Intelligence 121(1-2), 10–49 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: SPUDD: Stochastic Planning using Decision Diagrams. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 279–288. Morgan Kaufmann, San Francisco (1999)Google Scholar
  25. 25.
    Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)CrossRefGoogle Scholar
  26. 26.
    Butz, M.V.: Rule-Based Evolutionary Online Learning Systems: A Principled Approach to LCS Analysis and Design. Springer, Heidelberg (2006)zbMATHGoogle Scholar
  27. 27.
    Butz, M., Kovacs, T., Lanzi, P.L., Wilson, S.W.: Toward a theory of generalization and learning in XCS. IEEE Transactions on Evolutionary Computation 8(1), 28–46 (2004)CrossRefGoogle Scholar
  28. 28.
    Butz, M.V., Lanzi, P.L., Wilson, S.W.: Function approximation with XCS: Hyperellipsoidal conditions, recursive least squares, and compaction. IEEE Transactions on Evolutionary Computation 12, 355–376 (2008)CrossRefGoogle Scholar
  29. 29.
    Potts, D.: Incremental learning of linear model trees. In: Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004), pp. 663–670 (2004)Google Scholar
  30. 30.
    Schaal, S., Atkeson, C.G.: Constructive incremental learning from only local information. Neural Computation 10, 2047–2084 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Olivier Sigaud
    • 1
  • Martin V. Butz
    • 4
  • Olga Kozlova
    • 1
    • 2
  • Christophe Meyer
    • 3
  1. 1.Institut des Systèmes Intelligents et de Robotique (ISIR), CNRS UMR 7222Université Pierre et Marie Curie - Paris6ParisFrance
  2. 2.Thales Security Solutions & Services, SimulationCergy Pontoise CedexFrance
  3. 3.Thales Security Solutions & Services, ThereSIS Research and Innovation OfficePalaiseau CedexFrance
  4. 4.University of WürzburgWürzburgGermany

Personalised recommendations