Adaptive Learning in Continuous Environment Using Actor-Critic Design and Echo-State Networks

Oubbati, Mohamed; Uhlemann, Johannes; Palm, Günther

doi:10.1007/978-3-642-33093-3_32

Adaptive Learning in Continuous Environment Using Actor-Critic Design and Echo-State Networks

Mohamed Oubbati²²,
Johannes Uhlemann²² &
Günther Palm²²

Conference paper

1527 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7426))

Abstract

Approximating adaptive dynamic programming has been studied extensively in recent years for its potential scalability to solve problems involving continuous state and action spaces. The framework of adaptive critic design (ACD) addresses this issue and has been demonstrated in several case studies. The present paper proposes an implementation of ACD using an echo state network as the critic. The ESN is trained online to estimate the utility function and adapt the control policy of an embodied agent. In addition to its simple training algorithm, the ESN structure facilitates backpropagation of derivatives needed for adapting the controller. Experimental results using a mobile robot are provided to validate the proposed learning architecture.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bellman, R.E.: Dynamic Programming. Princeton Univ. Press, NJ (1957)
MATH Google Scholar
Dreyfus, S.E., Law, A.M.: Art and Theory of Dynamic Programming. Academic Press, Inc., Orlando (1977)
MATH Google Scholar
Werbos, P.: Approximate dynamic programming for realtime control and neural modeling. In: Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, New York (1992)
Google Scholar
Werbos, P.J.: Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks 2, 179–189 (1990)
Article Google Scholar
White, D.A., Sofge, D.A. (eds.): Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, New York (1992)
Google Scholar
Werbos, P.J.: A menu of designs for reinforcement learning over time. In: Neural Networks for Control, pp. 67–95. MIT Press, Cambridge (1990)
Google Scholar
Prokhorov, D., Wunsch, D.: Adaptive critic designs. IEEE Transactions on Neural Networks 8, 997–1007 (1997)
Article Google Scholar
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear hjb solution using approximate dynamic programming: Convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38(4), 943–949 (2008)
Article Google Scholar
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Brief paper: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)
Article MathSciNet MATH Google Scholar
Vrabie, D., Lewis, F.L.: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Networks 22(3), 237–246 (2009)
Article Google Scholar
Funahashi, K.-I., Nakamura, Y.: Approximation of dynamical systems by continuous time recurrent neural networks. Neural Network 6(6), 801–806 (1993)
Article Google Scholar
Werbos, P.J.: Backpropagation through time: What it does and how to do it. Proceedings of the IEEE 78(10), 1550–1560 (1990)
Article Google Scholar
Lukoševičius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Computer Science Review 3(3), 127–149 (2009)
Article Google Scholar
Jaeger, H.: The ’echo state’ approach to analysing and training recurrent neural networks. Technical Report 148, AIS Fraunhofer, St. Augustin, Germany (2001)
Google Scholar
Koprinkova, H.P., Oubbati, M., Palm, G.: Adaptive critic design with echo state network. In: IEEE Int. Conference on Systems, Man, and Cybernetics, pp. 1010–1015 (2010)
Google Scholar
Oubbati, M., Kächele, M., Koprinkova, P., Palm, G.: Anticipating rewards in continuous time and space with echo state networks and actor-critic design. In: 19th European Symposium on Artificial Neural Networks (ESANN 2011), pp. 117–122 (2011)
Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
Google Scholar
Schrauwen, B., Wardermann, M., Verstraeten, D., Steil, J.J., Stroobandt, D.: Improving reservoirs using intrinsic plasticity. Neurocomputing 71, 1159–1171 (2008)
Article Google Scholar
Obst, O., Boedecker, J., Asada, M.: Improving Recurrent Neural Network Performance Using Transfer Entropy. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010, Part II. LNCS, vol. 6444, pp. 193–200. Springer, Heidelberg (2010)
Chapter Google Scholar
Xue, Y., Yang, L., Haykin, S.: Decoupled echo state networks with lateral inhibition. Neural Networks 20, 365–376 (2007)
Article MATH Google Scholar
Zhidong, D., Yi, Z.: Collective behavior of a small-world recurrent neural system with scale-free distribution. IEEE Transactions on Neural Networks 18(5), 1364–1375 (2007)
Article Google Scholar
Rodan, A., Tino, P.: Minimum complexity echo state network. IEEE Transactions on Neural Networks 22(1), 131–144 (2011)
Article Google Scholar
Coello Coello, C.A., Lamont, G.B.: Applications of multi-objective evolutionary algorithms. Advances in Natural Computation, vol. 1 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Neural Information Processing, University of Ulm, 89069, Ulm, Germany
Mohamed Oubbati, Johannes Uhlemann & Günther Palm

Authors

Mohamed Oubbati
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Uhlemann
View author publications
You can also search for this author in PubMed Google Scholar
Günther Palm
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Informatics Research Centre, University of Skövde, Kanikegränd 3A, 54134, Skövde, Sweden
Tom Ziemke
Lund University, Lundagård, Kungshuset, 22222, Lund, Sweden
Christian Balkenius
Mærsk McKinney Møller Institute, University of Southern Denmark, Campusvej 55, 5230, Odense, Denmark
John Hallam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oubbati, M., Uhlemann, J., Palm, G. (2012). Adaptive Learning in Continuous Environment Using Actor-Critic Design and Echo-State Networks. In: Ziemke, T., Balkenius, C., Hallam, J. (eds) From Animals to Animats 12. SAB 2012. Lecture Notes in Computer Science(), vol 7426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33093-3_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-33093-3_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33092-6
Online ISBN: 978-3-642-33093-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics