Abstract
The work of Barto, Sutton and Williams on the ACE/ASE model for Reinforcement Learning is here put in perspective. In their work a state-control (input-output) map, which allows to balance a pole hinged on a moving cart as long as possible, is learned when the only information provided by the environment is the system failure. This work has given rise to a large body of research in the fields of machine learning and artificial intelligence. Its relevance lies in the fact that it can be applied to control all those systems which are only partially known. A critical issue is the exploration of the state space which may require impractical amount of memory and learning time. Adaptive networks, which have been studied in the most recent years, offer a natural solution in the implementation of the learning system allowing an adaptive partitioning of the state space according to the task difficulty experienced in the different regions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
L.P. Kaelbling, M.L. Littman and A.W. Moore, Reinforcement Learning: A Survey, J. Artificial Intelligence Research 4, (1996) 237–285.
R.J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, 8:3, (1992), 29–256.
R.S. Sutton, Learning to predict by the method of temporal differences, Machine Learning, 3:1, (1988), 9–44.
W.T. Miller III, R.S. Sutton and P.J. Werbos (Eds.), Neural Networks for Control, (1992) MIT Press.
A.G. Barto, R.S. Sutton, and C.W. Anderson, Neuronlike adaptive elements that can solve difficult learning problems. IEEE Trans. Syst. Man and Cybern. SMC-13, (1983), 834–846. u]6._D. Michie and R.A. Chambers, “BOXES: An experiment in adaptive control”, Machine Intelligence, 2, E. Dale and D. Michies, Eds. Edinburgh: Oliver and Boyd, (1968), 237–152.
B. Widrow, N.K. Gupta and S. Maitra, Punish/reward: learning with a critic in adaptive threshold systems, IEEE Trans. Syst. Man Cybern. SMC-3, (1973), 455–465.
M.I. Jordan and D.E. Rumelhart, Forward models: Supervised learning with a distal teacher. Cognitive Science, 16, (1992), 307–354.
N.A. Borghese and M. Arbib, Generation of Temporal Sequences using Local Dynamic Programming, Neural Networks 8:1, (1995), 39–54.
C.W. Anderson, Learning to control an inverted pendulum using neural networks. IEEE Control Systems Magazine (1989) 31–37.
H.A. Berenji and P. Khedkar, Learning and tuning fuzzy logic controllers through reinforcement. IEEE Trans. on Neural Networks, 3:5 (1992), 724–740.
N.A. Borghese and S. Ferrari, Hierarchical RBF networks in function approximation. Neuro Computing, (1998).
M. Cannon and J.E. Slotine, Space-Frequency localized basis function networks for nonlinear system estimation and control, Neurocomputing 9 (1995) 293–342.
B. Fritzke, Growing Cell Structures: A Self-organizing Network for Unsupervised and Supervised Learning, Neural Networks 7 (1994) 1441–1460.
T.M. Martinetz, S.G. Berkovich and K.J. Schulten, Neural-Gas” Network for Vector Quantization and its Application to Time-Series Prediction, IEEE Trans. Neural Networks 4:4 (1993) 558–568.
A. Baraldi and E. Alpayd, Simplified ART: A new class of ART algorithms, ICSI TR-98-004, (1998), Berkeley CA.
A.W. Moore, Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued space. In Proc. Eight International Machine Learning Workshop, (1991).
V. Gullapalli, A stochastic reinforcement learning algorithm for learning real-valued functions, Neural Networks 3, (1990) 671–692.
G. Tesauro, TD-Gammon, a self-teaching backgammon program which achieves master-level play. Neural Computation, 6(2), (1994) 215–219.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag London Limited
About this paper
Cite this paper
Borghese, N.A., Biemmi, C., Monaco, F.L. (1999). Learning to balance a pole on a movable cart through RL: what can be gained using Adaptive NN?. In: Marinaro, M., Tagliaferri, R. (eds) Neural Nets WIRN VIETRI-98. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0811-5_16
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0811-5_16
Publisher Name: Springer, London
Print ISBN: 978-1-4471-1208-2
Online ISBN: 978-1-4471-0811-5
eBook Packages: Springer Book Archive