Anticipating Rewards in Continuous Time and Space: A Case Study in Developmental Robotics

Blanchard, Arnaud J.; Cañamero, Lola

doi:10.1007/978-3-540-74262-3_15

Anticipating Rewards in Continuous Time and Space: A Case Study in Developmental Robotics

Arnaud J. Blanchard¹ &
Lola Cañamero¹

Conference paper

914 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4520))

Abstract

This paper presents the first basic principles, implementation and experimental results of what could be regarded as a new approach to reinforcement learning, where agents—physical robots interacting with objects and other agents in the real world—can learn to anticipate rewards using their sensory inputs. Our approach does not need discretization, notion of events, or classification, and instead of learning rewards for the different possible actions of an agent in all the situations, we propose to make agents learn only the main situations worth avoiding and reaching. However, the main focus of our work is not reinforcement learning as such, but modeling cognitive development on a small autonomous robot interacting with an “adult” caretaker, typically a human, in the real world; the control architecture follows a Perception-Action approach incorporating a basic homeostatic principle. This interaction occurs in very close proximity, uses very coarse and limited sensory-motor capabilities, and affects the “well-being” and affective state of the robot. The type of anticipatory behavior we are concerned with in this context relates to both sensory and reward anticipation. We have applied and tested our model on a real robot.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arkin, R.: Behavior-Based Robotics. The MIT Press, Cambridge (1998)
Google Scholar
Avila-Garcia, O., Cañamero, L.: Using hormonal feedback to modulate action selection in a competitive scenario. In: Schaal, S., Ijspeert, J., Billard, A., Vijayakumar, S., Hallam, J., Meyer, J.A. (eds.) From Animals to Animats 8. Proceedings of the 8th International Conference on Simulation of Adaptive Behavior, pp. 243–252. The MIT Press, Cambridge, MA (2004)
Google Scholar
Bateson, P.: What must be know in order to understand imprinting? In: Heyes, C., Huber, L. (eds.) The Evolution of Cognition, pp. 85–102. The MIT Press, Cambridge, MA (2000)
Google Scholar
Blanchard, A., Cañamero, L.: From imprinting to adaptation: Building a history of affective interaction. In: Proc. of the 5th Intl. Wksp. on Epigenetic Robotics, pp. 23–30 (2005)
Google Scholar
Blanchard, A.: and Cañamero, L.: Developing Affect-Modulated Behaviors: Stability, Exploration, Exploitation, or Imitation?. In: Kaplan, F., et al. (eds.) In: Proc. Sixth Intl. Conf. on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, Lund University Cognitive Studies 128, 17–24 (2006)
Google Scholar
Blanchard, A., Cañamero, L.: Modulation of exploratory behavior for adaptation to the context. In: Kovacs, T.J.M.(ed.) Biologically Inspired Robotics (Biro-net) in AISB’06: Adaptation in Artificial and Biological Systems. vol. II, pp. 131–139 (2006)
Google Scholar
Butz, M.V., Sigaud, O., Gérard, P.: Internal models and anticipations in adaptive learning systems. In: Butz, M.V., Sigaud, O., Gérard, P. (eds.) Anticipatory Behavior in Adaptive Learning Systems. LNCS (LNAI), vol. 2684, Springer-Verlag, Heidelberg (2003)
Google Scholar
Cañamero, L., Blanchard, A., Nadel, J.: Attachment Bonds for Human-Like Robots. International Journal of Humanoid Robotics 3(3), 301–320 (2006)
Article Google Scholar
Cos-Aguilera, I., Cañamero, L., Hayes, G.: Learning Object Functionalisites in the Context of Action Selection. In: Nehmzow, U., Melhuish, C. (eds.) Towards Intelligent Mobile Robots, TIMR 2003: 4th British Conference on Mobile Robotics. University of the West of England, Bristol, UK, August 28–29 (2003)
Google Scholar
Cos-Aguilera, I., Cañamero, L., Hayes, G.: Motivation-Driven Learning of Action Affordances. In: Cañamero, L. (ed.) Agents that Want and Like: Motivational and Emotional Roots of Cognition and Action. Papers from the AISB 2005 Symposium. University of Hertfordshire, UK, April 14–15, pp. 33–36, AISB Press (2005)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, Chichester (1991)
MATH Google Scholar
Doya, K.: Reinforcement learning in continuous time and space. Neural Computation 12(1), 219–245 (2000)
Article Google Scholar
Gaussier, P., Zrehen, S.: Perac: A neural architecture to control artificial animals. Robotics and Autonomous Systems 16, 291–320 (1995)
Article Google Scholar
Oudeyer, P.Y., Kaplan, F.: Intelligent adaptive curiosity: a source of self-development. In: Berthouze, L., Kozima, H., Prince, C.G., Sandini, G., Stojanov, G., Metta, G., Balkenius, C. (eds.) Proc. of the 4th Intl. Wks. on Epigenetic Robotics. Lund University Cognitive Studies. vol. 117, pp. 127–130 ( 2004)
Google Scholar
Prinz, W.: Perception and action planning. European journal of cognitive psychology 9(2), 129–154 (1997)
Article Google Scholar
Rescorla, R., Wagner, A.: A.: A theory of pavlovian conditioning: Variations in effectiveness of reinforcement and nonreinforcement. In: Black, A., Prokasy, W. (eds.) Classical Conditioning II, pp. 64–99. Appleton-Century-Crofts, New York (1972)
Google Scholar
Steels, L.: The autotelic principle. In: Iida, F., Pfeifer, R., Steels, L., Kuniyoshi, Y. (eds.) Embodied Artificial Intelligence. LNCS (LNAI), vol. 3139, pp. 231–242. Springer, Heidelberg (2004)
Google Scholar
Sutton, R., Barto, A.: A temporal-difference model of classical conditioning. In: Proceedings of the Ninth Annual Conference of the Cognitive Science Society, 355–378 (1987)
Google Scholar
Watkins, C.: Learning from Delayed Rewards. PhD thesis, King’s College (1989)
Google Scholar
Wolpert, D., Macready, W.: No free lunch theorems for optimisation. IEEE Trans. on Evolutionary Computation 1, 67–82 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Adaptive System Research Group, School of Computer Science, University of Hertfordshire, College Lane, Hatfield, Herts AL10 9AB, UK
Arnaud J. Blanchard & Lola Cañamero

Authors

Arnaud J. Blanchard
View author publications
You can also search for this author in PubMed Google Scholar
Lola Cañamero
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Martin V. Butz Olivier Sigaud Giovanni Pezzulo Gianluca Baldassarre

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Blanchard, A.J., Cañamero, L. (2007). Anticipating Rewards in Continuous Time and Space: A Case Study in Developmental Robotics. In: Butz, M.V., Sigaud, O., Pezzulo, G., Baldassarre, G. (eds) Anticipatory Behavior in Adaptive Learning Systems. ABiALS 2006. Lecture Notes in Computer Science(), vol 4520. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74262-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-540-74262-3_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74261-6
Online ISBN: 978-3-540-74262-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics