Skip to main content

Learning to balance a pole on a movable cart through RL: what can be gained using Adaptive NN?

  • Conference paper
Neural Nets WIRN VIETRI-98

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

  • 136 Accesses

Abstract

The work of Barto, Sutton and Williams on the ACE/ASE model for Reinforcement Learning is here put in perspective. In their work a state-control (input-output) map, which allows to balance a pole hinged on a moving cart as long as possible, is learned when the only information provided by the environment is the system failure. This work has given rise to a large body of research in the fields of machine learning and artificial intelligence. Its relevance lies in the fact that it can be applied to control all those systems which are only partially known. A critical issue is the exploration of the state space which may require impractical amount of memory and learning time. Adaptive networks, which have been studied in the most recent years, offer a natural solution in the implementation of the learning system allowing an adaptive partitioning of the state space according to the task difficulty experienced in the different regions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L.P. Kaelbling, M.L. Littman and A.W. Moore, Reinforcement Learning: A Survey, J. Artificial Intelligence Research 4, (1996) 237–285.

    Google Scholar 

  2. R.J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, 8:3, (1992), 29–256.

    Google Scholar 

  3. R.S. Sutton, Learning to predict by the method of temporal differences, Machine Learning, 3:1, (1988), 9–44.

    Google Scholar 

  4. W.T. Miller III, R.S. Sutton and P.J. Werbos (Eds.), Neural Networks for Control, (1992) MIT Press.

    Google Scholar 

  5. A.G. Barto, R.S. Sutton, and C.W. Anderson, Neuronlike adaptive elements that can solve difficult learning problems. IEEE Trans. Syst. Man and Cybern. SMC-13, (1983), 834–846. u]6._D. Michie and R.A. Chambers, “BOXES: An experiment in adaptive control”, Machine Intelligence, 2, E. Dale and D. Michies, Eds. Edinburgh: Oliver and Boyd, (1968), 237–152.

    Google Scholar 

  6. B. Widrow, N.K. Gupta and S. Maitra, Punish/reward: learning with a critic in adaptive threshold systems, IEEE Trans. Syst. Man Cybern. SMC-3, (1973), 455–465.

    Article  MathSciNet  Google Scholar 

  7. M.I. Jordan and D.E. Rumelhart, Forward models: Supervised learning with a distal teacher. Cognitive Science, 16, (1992), 307–354.

    Article  Google Scholar 

  8. N.A. Borghese and M. Arbib, Generation of Temporal Sequences using Local Dynamic Programming, Neural Networks 8:1, (1995), 39–54.

    Article  Google Scholar 

  9. C.W. Anderson, Learning to control an inverted pendulum using neural networks. IEEE Control Systems Magazine (1989) 31–37.

    Google Scholar 

  10. H.A. Berenji and P. Khedkar, Learning and tuning fuzzy logic controllers through reinforcement. IEEE Trans. on Neural Networks, 3:5 (1992), 724–740.

    Article  Google Scholar 

  11. N.A. Borghese and S. Ferrari, Hierarchical RBF networks in function approximation. Neuro Computing, (1998).

    Google Scholar 

  12. M. Cannon and J.E. Slotine, Space-Frequency localized basis function networks for nonlinear system estimation and control, Neurocomputing 9 (1995) 293–342.

    Article  MATH  Google Scholar 

  13. B. Fritzke, Growing Cell Structures: A Self-organizing Network for Unsupervised and Supervised Learning, Neural Networks 7 (1994) 1441–1460.

    Article  Google Scholar 

  14. T.M. Martinetz, S.G. Berkovich and K.J. Schulten, Neural-Gas” Network for Vector Quantization and its Application to Time-Series Prediction, IEEE Trans. Neural Networks 4:4 (1993) 558–568.

    Article  Google Scholar 

  15. A. Baraldi and E. Alpayd, Simplified ART: A new class of ART algorithms, ICSI TR-98-004, (1998), Berkeley CA.

    Google Scholar 

  16. A.W. Moore, Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued space. In Proc. Eight International Machine Learning Workshop, (1991).

    Google Scholar 

  17. V. Gullapalli, A stochastic reinforcement learning algorithm for learning real-valued functions, Neural Networks 3, (1990) 671–692.

    Article  Google Scholar 

  18. G. Tesauro, TD-Gammon, a self-teaching backgammon program which achieves master-level play. Neural Computation, 6(2), (1994) 215–219.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag London Limited

About this paper

Cite this paper

Borghese, N.A., Biemmi, C., Monaco, F.L. (1999). Learning to balance a pole on a movable cart through RL: what can be gained using Adaptive NN?. In: Marinaro, M., Tagliaferri, R. (eds) Neural Nets WIRN VIETRI-98. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-0811-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0811-5_16

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-1208-2

  • Online ISBN: 978-1-4471-0811-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics