Natural Computing

, Volume 8, Issue 1, pp 57–99 | Cite as

A learning classifier system for mazes with aliasing clones

  • Zhanna V. Zatuchna
  • Anthony J. Bagnall


Maze problems represent a simplified virtual model of the real environment and can be used for developing core algorithms of many real-world application related to the problem of navigation. Learning Classifier Systems (LCS) are the most widely used class of algorithms for reinforcement learning in mazes. However, LCSs best achievements in maze problems are still mostly bounded to non-aliasing environments, while LCS complexity seems to obstruct a proper analysis of the reasons for failure. Moreover, there is a lack of knowledge of what makes a maze problem hard to solve by a learning agent. To overcome this restriction we try to improve our understanding of the nature and structure of maze environments. In this paper we describe a new LCS agent that has a simpler and more transparent performance mechanism. We use the structure of a predictive LCS model, strip out the evolutionary mechanism, simplify the reinforcement learning procedure and equip the agent with the ability to Associative Perception, adopted from psychology. We then assess the new LCS with Associative Perception on an extensive set of mazes and analyse the results to discover which features of the environments play the most significant role in the learning process. We identify a particularly hard feature for learning in mazes, aliasing clones, which arise when groups of aliasing cells occur in similar patterns in different parts of the maze. We discuss the impact of aliasing clones and other types of aliasing on learning algorithms.


Learning agents Learning Classifier Systems Associative perception Navigation Aliasing Maze 


  1. Arai S, Sycara K (2001) Credit assignment method for learning effective stochastic policies in uncertain domain. In: Spector L, Goodman ED, Wu A, Langdon WB, Voigt H-M, Gen M, Sen S, Dorigo M, Pezeshk S, Garzon MH, Burke E (eds) Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pp 815–822. San Francisco, California, USA, 7–11 2001. Morgan KaufmannGoogle Scholar
  2. Bagnall AJ, Smith GD (2005) A multi-agent model of the UK market in electricity generation. IEEE Trans Evol Comput 9(5)Google Scholar
  3. Bagnall AJ, Zatuchna ZV (2005) On the classification of maze problems. In: Bull L, Kovacs T (eds) Foundations of Learning Classifier Systems. Springer, pp 307–316Google Scholar
  4. Browne W, Scott D (2005) An abstraction agorithm for genetics-based reinforcement learning. In: Beyer H-G, et al (eds) GECCO 2005: proceedings of the 2005 conference on genetic and evolutionary computation, vol 2, pp 1875–1882, 25–29 June 2005, ACM Press, Washington, DC, USAGoogle Scholar
  5. Bull L (2002) Lookahead latent learning in ZCS. In: Langdon WB, Cantú-Paz E, Mathias K, Roy R, Davis D, Poli R, Balakrishnan K, Honavar V, Rudolph G, Wegener J, Bull L, Potter MA, Schultz AC, Miller JF, Burke E, Jonoska N (eds) GECCO 2002: proceedings of the genetic and evolutionary computation conference, pp 897–904, 9–13 July 2002. Morgan Kaufmann Publishers, New YorkGoogle Scholar
  6. Bull L, Hurst J (2001) ZCS: theory and practice. Technical Report 01-001, UWE Learning Classifier Systems GroupGoogle Scholar
  7. Bull L, Hurst J (2002) ZCS redux. Evol Comput 10(2):185–205CrossRefGoogle Scholar
  8. Bull L, Hurst J (2003) A neural Learning Classifier System with self-adaptive constructivism. Technical report, University of the West of EnglandGoogle Scholar
  9. Butz MV, Goldberg DE, Stolzmann W (2000) Probability-enhanced predictions in the Anticipatory Classifier System. In: Proceedings of the International Workshop on Learning Classifier Systems (IWLCS-2000), in the joint workshops of SAB 2000 and PPSN 2000 [1]. Extended abstractGoogle Scholar
  10. Cassandra AR, Kaelbling LP, Littman ML (1994) Acting optimally in partially observable stochastic domains. In: Proceedings of the twelfth national conference on artificial intelligence (AAAI-94), vol 2, pp 1023–1028. MIT PressGoogle Scholar
  11. Cliff D, Ross S (1994) Adding temporary memory to ZCS. Adapt Behav 3(2):101–150CrossRefGoogle Scholar
  12. Hoffman J (1993) Vorhersage und Erkenntnis. Gottingen, HogrefeGoogle Scholar
  13. Holland JH, Reitman JS (1978) Cognitive systems based on adaptive algorithms. In: Waterman DA, Hayes-Roth F (eds) Pattern-directed inference systems. Academic Press, New YorkGoogle Scholar
  14. Hurst J, Bull L (2000) A self-adaptive Classifier System. In: Lanzi PL [1], pp 70–79. Extended abstractGoogle Scholar
  15. Lanzi PL (1997a) A model of the environment to avoid local learning (an analysis of the generalization mechanism of XCS). Technical Report 97.46, Politecnico di Milano. Department of Electronic Engineering and Information Sciences.
  16. Lanzi PL (1997b) Solving problems in partially observable environments with Classifier Systems (Experiments on adding memory to XCS). Technical Report 97.45, Politecnico di Milano. Department of Electronic Engineering and Information Sciences.
  17. Lanzi PL (1997c) A study of the generalization capabilities of XCS. In: Bäck T (ed) Proceedings of the 7th International Conference on Genetic Algorithms (ICGA97), pp 418–425. Morgan Kaufmann,
  18. Lanzi PL (1998) An analysis of the memory mechanism of XCSM. In: Koza JR, Banzhaf W, Chellapilla K, Deb K, Dorigo M, Fogel DB, Garzon MH, Goldberg DE, Iba H, Riolo R (eds) Genetic programming 1998: proceedings of the third annual conference, pp 643–651. Morgan Kaufmann,
  19. Lanzi PL, Wilson SW (1999) Optimal Classifier System performance in non-Markov environments. Technical Report 99.36, Dipartimento di Elettronica e Informazione – Politecnico di MilanoGoogle Scholar
  20. Littman ML (1992) An optimization-based categorization of reinforcement learning environments. In: Roitblatand J-AMH (ed) From animals to animats 2: proceedings of the second international conference on simulation of adaptive behavior. The MIT Press/Bradford BooksGoogle Scholar
  21. Littman ML (1995) Learning policies for partially observable environments: scaling up. In: Proceedings of the twelfth international conference on machine learningGoogle Scholar
  22. Lorenz K (1935) Der kumpan in der umwelt des vogels. J Ornithol 137–215Google Scholar
  23. McCallum AR (1993) Overcoming incomplete perception with utile distinction memory. In: The proceedings of the tenth international machine learning conferenceGoogle Scholar
  24. Métivier M, Lattaud C (2002) Anticipatory Classifier System using behavioral sequences in non-Markov environments. In: IWLCS, pp 143–162Google Scholar
  25. Miyazaki K, Kobayashi S (1999) Proposal for an algorithm to improve a rational policy in POMDPs. In: Proc of international conference on Systems, Man and Cybernetics (SMC 99), pp 492–497Google Scholar
  26. O’Hara T, Bull L (2005) A memetic accuracy-based neural Learning Classifier System. In: Proceedings of the IEEE congress on evolutionary computation, pp 2040–2045. IEEEGoogle Scholar
  27. Pavlov IP (1927) Conditioned reflexes. Oxford University Press, LondonGoogle Scholar
  28. Proceedings of the International Workshop on Learning Classifier Systems (IWLCS-2000), in the joint workshops of SAB 2000 and PPSN 2000 (2000). Pier Luca Lanzi, Wolfgang Stolzmann and Stewart W. Wilson (workshop organisers)Google Scholar
  29. Skinner BF (1953) Science and human behavior. Macmillan, New YorkGoogle Scholar
  30. Stolzmann W (2000) An introduction to Anticipatory Classifier Systems. In: Stolzmann W, Lanzi PL, Wilson SW (eds) Learning Classifier Systems, from foundations to applications. Springer-Verlag, pp 175–194Google Scholar
  31. Studley M, Bull L (2005) Using the XCS classifier system for multi-objective reinforcement learning problems. Technical report, University of the West of EnglandGoogle Scholar
  32. Thorndike EL (1911) Animal intelligence. Hafner, Darien, CTGoogle Scholar
  33. Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3):272–292Google Scholar
  34. Wertheimer M (1938) Laws of organization in perceptual forms. In: A source book of gestalt psychology. Routledge and Kegan Paul, London, pp 71–88Google Scholar
  35. Wilson SW (1990) The animat path to AI. In: Meyer JA, Wilson SW (eds) From animals to animats 1. Proceedings of the first international conference on Simulation of Adaptive Behavior (SAB90), pp 15–21. A Bradford book. MIT Press,
  36. Wilson SW (1994) ZCS: a zeroth level Classifier System. Evol Comput 2(1):1–18CrossRefGoogle Scholar
  37. Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175CrossRefGoogle Scholar
  38. Zatuchna ZV (2004) AgentP model: Learning Classifier System with Associative Perception. In: Yao X et al (eds) Proceedings of the Parallel Problem Solving from Nature Conference (PPSN), vol 3242, of Lecture Notes in Computer Science, pp 1172–1182. SpringerGoogle Scholar
  39. Zatuchna ZV (2006) AgentP: A Learning Classifier System with Associative Perception in Maze Environments. PhD Thesis, School of Computing Sciences, University of East AngliaGoogle Scholar
  40. Zatuchna ZV, Bagnall AJ (2005) AgentP classifier system: Self-adjusting vs. Gradual approach. In: Proceedings of the 2005 congress on evolutionary computation, pp 1196–1203Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  1. 1.School of Computing SciencesUniversity of East AngliaNorwichEngland

Personalised recommendations