Hybrid adaptive heuristic critic architectures for learning in mazes with continuous search spaces

Pipe, A. G.; Fogarty, T. C.; Winfield, A.

doi:10.1007/3-540-58484-6_291

A. G. Pipe¹,
T. C. Fogarty² &
A. Winfield¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 866))

Included in the following conference series:

International Conference on Parallel Problem Solving from Nature

177 Accesses
7 Citations

Abstract

We present the first results obtained from two implementations of a hybrid architecture which balances exploration and exploitation to solve mazes with continuous search spaces. In both cases the critic is based around a Radial Basis Function (RBF) Neural Network which uses Temporal Difference learning to acquire a continuous valued internal model of the environment through interaction with it. Also in both cases an Evolutionary Algorithm is employed in the search policy for each movement. In the first implementation a Genetic Algorithm (GA) is used, and in the second an Evolutionary Strategy (ES). Over successive trials the maze solving agent learns the V-function, a mapping between real numbered positions in the maze and the value of being at those positions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Back T., Schwefel H-P., 1993, ‘An Overview of Evolutionary Algorithms for Parameter Optimization', Evolutionary Computation Vol.1 Num.1, pp1–23
Google Scholar
Barto A. G., Bradtke S. J., Singh S. P., 1991, ‘Real-Time Learning and Control using Asynchronous Dynamic Programming', Dept. of Computer Science, University of Massachusetts, USA, Technical Report 91-57
Google Scholar
Barto A. G., Sutton R. S., Watkins C. J. C. H., 1989, ‘Learning and Sequential Decision Making', COINS Technical Report 89–95
Google Scholar
Belew R. K., McInerney J., Schraudolph N. N., 1990, ‘Evolving Networks: Using the Genetic Algorithm with Connectionist Learning', University of California at San Diego, USA, CSE Technical Report CS90-174
Google Scholar
Booker L. B., Goldberg D. E., Holland J. H., 1989, ‘Classifier Systems and Genetic Algorithms', Artificial Intelligence 40, pp.235–282
Article Google Scholar
Cliff D., Husbands P., Harvey I., 1992, ‘Evolving Visually Guided Robots', University of Sussex, Cognitive Science Research Papers CSRP 220
Google Scholar
Lin L., PhD thesis, 1993, ‘Reinforcement Learning for Robots using Neural Networks', Computer Science School, Carnegie Mellon University Pittsburgh, USA
Google Scholar
Poggio T., Girosi F., 1989, ‘A theory of Networks for Approximation and Learning', MIT Cambridge, MA, AI lab. Memo 1140
Google Scholar
Roberts G., 1989, ‘A rational reconstruction of Wilson's Animat and Holland's CS-1', Procs. of 3rd International Conference on Genetic Algorithms, pp.317–321, Editor Schaffer J. D., Morgan Kaufmann
Google Scholar
Roberts G., 1991, ‘Classifier Systems for Situated Autonomous Learning', PhD thesis, Edinburgh University
Google Scholar
Roberts G., 1993, ‘Dynamic Planning for Classifier Systems', Proceedings of the 5th International Conference on Genetic Algorithms, pp.231–237
Google Scholar
Sanner R. M., Slotine J. E., 1991, ‘Gaussian Networks for Direct Adaptive Control', Nonlinear Systems Lab., MIT, Cambridge, USA, Technical Report NSL-910503
Google Scholar
Sutton R. S., 1984, PhD thesis ‘Temporal Credit Assignment in Reinforcement Learning', University of Massachusetts, Dept. of computer and Information Science
Google Scholar
Sutton R. S., 1991, ‘Reinforcement Learning Architectures for Animats', From Animals to Animats, pp288–296, Editors Meyer, J., Wilson, S., MIT Press
Google Scholar
Thrun S. B., 1992, ‘The Role of Exploration in Learning', Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold, Ed. White D. A., Sofge D. A.
Google Scholar
Watkins C. J. C. H., 1989, PhD thesis ‘Learning from Delayed Rewards', King's College, Cambridge.
Google Scholar
Werbos P. J., 1992, ‘Approximate Dynamic Programming for Real-Time Control and Neural Modelling', Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold, Ed. White D. A., Sofge D. A.
Google Scholar
Wilson S. W., 1985, ‘Knowledge growth in an artificial animal', Proceedings of an International Conference on Genetic Algorithms and their Applications, pp. 16–23, Editor Grefenstette J. J.
Google Scholar

Download references

Author information

Authors and Affiliations

Intelligent Autonomous Systems Lab. Faculty of Engineering, University of the West of England, Coldharbour Lane, BS16 1QY, Frenchay, Bristol, UK
A. G. Pipe & A. Winfield
Bristol Transputer Centre Faculty of Computer Science & Math, University of the West of England, Coldharbour Lane, BS16 1QY, Frenchay, Bristol, UK
T. C. Fogarty

Authors

A. G. Pipe
View author publications
You can also search for this author in PubMed Google Scholar
T. C. Fogarty
View author publications
You can also search for this author in PubMed Google Scholar
A. Winfield
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Yuval Davidor Hans-Paul Schwefel Reinhard Männer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pipe, A.G., Fogarty, T.C., Winfield, A. (1994). Hybrid adaptive heuristic critic architectures for learning in mazes with continuous search spaces. In: Davidor, Y., Schwefel, HP., Männer, R. (eds) Parallel Problem Solving from Nature — PPSN III. PPSN 1994. Lecture Notes in Computer Science, vol 866. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-58484-6_291

Download citation

DOI: https://doi.org/10.1007/3-540-58484-6_291
Published: 08 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58484-1
Online ISBN: 978-3-540-49001-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics