Skip to main content

Adaptive Learning in Continuous Environment Using Actor-Critic Design and Echo-State Networks

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7426))

Abstract

Approximating adaptive dynamic programming has been studied extensively in recent years for its potential scalability to solve problems involving continuous state and action spaces. The framework of adaptive critic design (ACD) addresses this issue and has been demonstrated in several case studies. The present paper proposes an implementation of ACD using an echo state network as the critic. The ESN is trained online to estimate the utility function and adapt the control policy of an embodied agent. In addition to its simple training algorithm, the ESN structure facilitates backpropagation of derivatives needed for adapting the controller. Experimental results using a mobile robot are provided to validate the proposed learning architecture.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bellman, R.E.: Dynamic Programming. Princeton Univ. Press, NJ (1957)

    MATH  Google Scholar 

  2. Dreyfus, S.E., Law, A.M.: Art and Theory of Dynamic Programming. Academic Press, Inc., Orlando (1977)

    MATH  Google Scholar 

  3. Werbos, P.: Approximate dynamic programming for realtime control and neural modeling. In: Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, New York (1992)

    Google Scholar 

  4. Werbos, P.J.: Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks 2, 179–189 (1990)

    Article  Google Scholar 

  5. White, D.A., Sofge, D.A. (eds.): Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, New York (1992)

    Google Scholar 

  6. Werbos, P.J.: A menu of designs for reinforcement learning over time. In: Neural Networks for Control, pp. 67–95. MIT Press, Cambridge (1990)

    Google Scholar 

  7. Prokhorov, D., Wunsch, D.: Adaptive critic designs. IEEE Transactions on Neural Networks 8, 997–1007 (1997)

    Article  Google Scholar 

  8. Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear hjb solution using approximate dynamic programming: Convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38(4), 943–949 (2008)

    Article  Google Scholar 

  9. Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Brief paper: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  10. Vrabie, D., Lewis, F.L.: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Networks 22(3), 237–246 (2009)

    Article  Google Scholar 

  11. Funahashi, K.-I., Nakamura, Y.: Approximation of dynamical systems by continuous time recurrent neural networks. Neural Network 6(6), 801–806 (1993)

    Article  Google Scholar 

  12. Werbos, P.J.: Backpropagation through time: What it does and how to do it. Proceedings of the IEEE 78(10), 1550–1560 (1990)

    Article  Google Scholar 

  13. Lukoševičius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Computer Science Review 3(3), 127–149 (2009)

    Article  Google Scholar 

  14. Jaeger, H.: The ’echo state’ approach to analysing and training recurrent neural networks. Technical Report 148, AIS Fraunhofer, St. Augustin, Germany (2001)

    Google Scholar 

  15. Koprinkova, H.P., Oubbati, M., Palm, G.: Adaptive critic design with echo state network. In: IEEE Int. Conference on Systems, Man, and Cybernetics, pp. 1010–1015 (2010)

    Google Scholar 

  16. Oubbati, M., Kächele, M., Koprinkova, P., Palm, G.: Anticipating rewards in continuous time and space with echo state networks and actor-critic design. In: 19th European Symposium on Artificial Neural Networks (ESANN 2011), pp. 117–122 (2011)

    Google Scholar 

  17. Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)

    Google Scholar 

  18. Schrauwen, B., Wardermann, M., Verstraeten, D., Steil, J.J., Stroobandt, D.: Improving reservoirs using intrinsic plasticity. Neurocomputing 71, 1159–1171 (2008)

    Article  Google Scholar 

  19. Obst, O., Boedecker, J., Asada, M.: Improving Recurrent Neural Network Performance Using Transfer Entropy. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010, Part II. LNCS, vol. 6444, pp. 193–200. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  20. Xue, Y., Yang, L., Haykin, S.: Decoupled echo state networks with lateral inhibition. Neural Networks 20, 365–376 (2007)

    Article  MATH  Google Scholar 

  21. Zhidong, D., Yi, Z.: Collective behavior of a small-world recurrent neural system with scale-free distribution. IEEE Transactions on Neural Networks 18(5), 1364–1375 (2007)

    Article  Google Scholar 

  22. Rodan, A., Tino, P.: Minimum complexity echo state network. IEEE Transactions on Neural Networks 22(1), 131–144 (2011)

    Article  Google Scholar 

  23. Coello Coello, C.A., Lamont, G.B.: Applications of multi-objective evolutionary algorithms. Advances in Natural Computation, vol. 1 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Oubbati, M., Uhlemann, J., Palm, G. (2012). Adaptive Learning in Continuous Environment Using Actor-Critic Design and Echo-State Networks. In: Ziemke, T., Balkenius, C., Hallam, J. (eds) From Animals to Animats 12. SAB 2012. Lecture Notes in Computer Science(), vol 7426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33093-3_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33093-3_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33092-6

  • Online ISBN: 978-3-642-33093-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics