Advertisement

Hybrid adaptive heuristic critic architectures for learning in mazes with continuous search spaces

  • A. G. Pipe
  • T. C. Fogarty
  • A. Winfield
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 866)

Abstract

We present the first results obtained from two implementations of a hybrid architecture which balances exploration and exploitation to solve mazes with continuous search spaces. In both cases the critic is based around a Radial Basis Function (RBF) Neural Network which uses Temporal Difference learning to acquire a continuous valued internal model of the environment through interaction with it. Also in both cases an Evolutionary Algorithm is employed in the search policy for each movement. In the first implementation a Genetic Algorithm (GA) is used, and in the second an Evolutionary Strategy (ES). Over successive trials the maze solving agent learns the V-function, a mapping between real numbered positions in the maze and the value of being at those positions.

Keywords

Genetic Algorithm Radial Basis Function Evolutionary Strategy Radial Basis Function Neural Network Radial Basis Function Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Back T., Schwefel H-P., 1993, ‘An Overview of Evolutionary Algorithms for Parameter Optimization', Evolutionary Computation Vol.1 Num.1, pp1–23Google Scholar
  2. Barto A. G., Bradtke S. J., Singh S. P., 1991, ‘Real-Time Learning and Control using Asynchronous Dynamic Programming', Dept. of Computer Science, University of Massachusetts, USA, Technical Report 91-57Google Scholar
  3. Barto A. G., Sutton R. S., Watkins C. J. C. H., 1989, ‘Learning and Sequential Decision Making', COINS Technical Report 89–95Google Scholar
  4. Belew R. K., McInerney J., Schraudolph N. N., 1990, ‘Evolving Networks: Using the Genetic Algorithm with Connectionist Learning', University of California at San Diego, USA, CSE Technical Report CS90-174Google Scholar
  5. Booker L. B., Goldberg D. E., Holland J. H., 1989, ‘Classifier Systems and Genetic Algorithms', Artificial Intelligence 40, pp.235–282CrossRefGoogle Scholar
  6. Cliff D., Husbands P., Harvey I., 1992, ‘Evolving Visually Guided Robots', University of Sussex, Cognitive Science Research Papers CSRP 220Google Scholar
  7. Lin L., PhD thesis, 1993, ‘Reinforcement Learning for Robots using Neural Networks', Computer Science School, Carnegie Mellon University Pittsburgh, USAGoogle Scholar
  8. Poggio T., Girosi F., 1989, ‘A theory of Networks for Approximation and Learning', MIT Cambridge, MA, AI lab. Memo 1140Google Scholar
  9. Roberts G., 1989, ‘A rational reconstruction of Wilson's Animat and Holland's CS-1', Procs. of 3rd International Conference on Genetic Algorithms, pp.317–321, Editor Schaffer J. D., Morgan KaufmannGoogle Scholar
  10. Roberts G., 1991, ‘Classifier Systems for Situated Autonomous Learning', PhD thesis, Edinburgh UniversityGoogle Scholar
  11. Roberts G., 1993, ‘Dynamic Planning for Classifier Systems', Proceedings of the 5th International Conference on Genetic Algorithms, pp.231–237Google Scholar
  12. Sanner R. M., Slotine J. E., 1991, ‘Gaussian Networks for Direct Adaptive Control', Nonlinear Systems Lab., MIT, Cambridge, USA, Technical Report NSL-910503Google Scholar
  13. Sutton R. S., 1984, PhD thesis ‘Temporal Credit Assignment in Reinforcement Learning', University of Massachusetts, Dept. of computer and Information ScienceGoogle Scholar
  14. Sutton R. S., 1991, ‘Reinforcement Learning Architectures for Animats', From Animals to Animats, pp288–296, Editors Meyer, J., Wilson, S., MIT PressGoogle Scholar
  15. Thrun S. B., 1992, ‘The Role of Exploration in Learning', Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold, Ed. White D. A., Sofge D. A.Google Scholar
  16. Watkins C. J. C. H., 1989, PhD thesis ‘Learning from Delayed Rewards', King's College, Cambridge.Google Scholar
  17. Werbos P. J., 1992, ‘Approximate Dynamic Programming for Real-Time Control and Neural Modelling', Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold, Ed. White D. A., Sofge D. A.Google Scholar
  18. Wilson S. W., 1985, ‘Knowledge growth in an artificial animal', Proceedings of an International Conference on Genetic Algorithms and their Applications, pp. 16–23, Editor Grefenstette J. J.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • A. G. Pipe
    • 1
  • T. C. Fogarty
    • 2
  • A. Winfield
    • 1
  1. 1.Intelligent Autonomous Systems Lab. Faculty of EngineeringUniversity of the West of EnglandFrenchay, BristolUK
  2. 2.Bristol Transputer Centre Faculty of Computer Science & MathUniversity of the West of EnglandFrenchay, BristolUK

Personalised recommendations