Abstract
Approximating adaptive dynamic programming has been studied extensively in recent years for its potential scalability to solve problems involving continuous state and action spaces. The framework of adaptive critic design (ACD) addresses this issue and has been demonstrated in several case studies. The present paper proposes an implementation of ACD using an echo state network as the critic. The ESN is trained online to estimate the utility function and adapt the control policy of an embodied agent. In addition to its simple training algorithm, the ESN structure facilitates backpropagation of derivatives needed for adapting the controller. Experimental results using a mobile robot are provided to validate the proposed learning architecture.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bellman, R.E.: Dynamic Programming. Princeton Univ. Press, NJ (1957)
Dreyfus, S.E., Law, A.M.: Art and Theory of Dynamic Programming. Academic Press, Inc., Orlando (1977)
Werbos, P.: Approximate dynamic programming for realtime control and neural modeling. In: Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand Reinhold, New York (1992)
Werbos, P.J.: Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks 2, 179–189 (1990)
White, D.A., Sofge, D.A. (eds.): Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, New York (1992)
Werbos, P.J.: A menu of designs for reinforcement learning over time. In: Neural Networks for Control, pp. 67–95. MIT Press, Cambridge (1990)
Prokhorov, D., Wunsch, D.: Adaptive critic designs. IEEE Transactions on Neural Networks 8, 997–1007 (1997)
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear hjb solution using approximate dynamic programming: Convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B 38(4), 943–949 (2008)
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.L.: Brief paper: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477–484 (2009)
Vrabie, D., Lewis, F.L.: Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Networks 22(3), 237–246 (2009)
Funahashi, K.-I., Nakamura, Y.: Approximation of dynamical systems by continuous time recurrent neural networks. Neural Network 6(6), 801–806 (1993)
Werbos, P.J.: Backpropagation through time: What it does and how to do it. Proceedings of the IEEE 78(10), 1550–1560 (1990)
Lukoševičius, M., Jaeger, H.: Reservoir computing approaches to recurrent neural network training. Computer Science Review 3(3), 127–149 (2009)
Jaeger, H.: The ’echo state’ approach to analysing and training recurrent neural networks. Technical Report 148, AIS Fraunhofer, St. Augustin, Germany (2001)
Koprinkova, H.P., Oubbati, M., Palm, G.: Adaptive critic design with echo state network. In: IEEE Int. Conference on Systems, Man, and Cybernetics, pp. 1010–1015 (2010)
Oubbati, M., Kächele, M., Koprinkova, P., Palm, G.: Anticipating rewards in continuous time and space with echo state networks and actor-critic design. In: 19th European Symposium on Artificial Neural Networks (ESANN 2011), pp. 117–122 (2011)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
Schrauwen, B., Wardermann, M., Verstraeten, D., Steil, J.J., Stroobandt, D.: Improving reservoirs using intrinsic plasticity. Neurocomputing 71, 1159–1171 (2008)
Obst, O., Boedecker, J., Asada, M.: Improving Recurrent Neural Network Performance Using Transfer Entropy. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010, Part II. LNCS, vol. 6444, pp. 193–200. Springer, Heidelberg (2010)
Xue, Y., Yang, L., Haykin, S.: Decoupled echo state networks with lateral inhibition. Neural Networks 20, 365–376 (2007)
Zhidong, D., Yi, Z.: Collective behavior of a small-world recurrent neural system with scale-free distribution. IEEE Transactions on Neural Networks 18(5), 1364–1375 (2007)
Rodan, A., Tino, P.: Minimum complexity echo state network. IEEE Transactions on Neural Networks 22(1), 131–144 (2011)
Coello Coello, C.A., Lamont, G.B.: Applications of multi-objective evolutionary algorithms. Advances in Natural Computation, vol. 1 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oubbati, M., Uhlemann, J., Palm, G. (2012). Adaptive Learning in Continuous Environment Using Actor-Critic Design and Echo-State Networks. In: Ziemke, T., Balkenius, C., Hallam, J. (eds) From Animals to Animats 12. SAB 2012. Lecture Notes in Computer Science(), vol 7426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33093-3_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-33093-3_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33092-6
Online ISBN: 978-3-642-33093-3
eBook Packages: Computer ScienceComputer Science (R0)