Abstract
This paper presents the first basic principles, implementation and experimental results of what could be regarded as a new approach to reinforcement learning, where agents—physical robots interacting with objects and other agents in the real world—can learn to anticipate rewards using their sensory inputs. Our approach does not need discretization, notion of events, or classification, and instead of learning rewards for the different possible actions of an agent in all the situations, we propose to make agents learn only the main situations worth avoiding and reaching. However, the main focus of our work is not reinforcement learning as such, but modeling cognitive development on a small autonomous robot interacting with an “adult” caretaker, typically a human, in the real world; the control architecture follows a Perception-Action approach incorporating a basic homeostatic principle. This interaction occurs in very close proximity, uses very coarse and limited sensory-motor capabilities, and affects the “well-being” and affective state of the robot. The type of anticipatory behavior we are concerned with in this context relates to both sensory and reward anticipation. We have applied and tested our model on a real robot.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Arkin, R.: Behavior-Based Robotics. The MIT Press, Cambridge (1998)
Avila-Garcia, O., Cañamero, L.: Using hormonal feedback to modulate action selection in a competitive scenario. In: Schaal, S., Ijspeert, J., Billard, A., Vijayakumar, S., Hallam, J., Meyer, J.A. (eds.) From Animals to Animats 8. Proceedings of the 8th International Conference on Simulation of Adaptive Behavior, pp. 243–252. The MIT Press, Cambridge, MA (2004)
Bateson, P.: What must be know in order to understand imprinting? In: Heyes, C., Huber, L. (eds.) The Evolution of Cognition, pp. 85–102. The MIT Press, Cambridge, MA (2000)
Blanchard, A., Cañamero, L.: From imprinting to adaptation: Building a history of affective interaction. In: Proc. of the 5th Intl. Wksp. on Epigenetic Robotics, pp. 23–30 (2005)
Blanchard, A.: and Cañamero, L.: Developing Affect-Modulated Behaviors: Stability, Exploration, Exploitation, or Imitation?. In: Kaplan, F., et al. (eds.) In: Proc. Sixth Intl. Conf. on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, Lund University Cognitive Studies 128, 17–24 (2006)
Blanchard, A., Cañamero, L.: Modulation of exploratory behavior for adaptation to the context. In: Kovacs, T.J.M.(ed.) Biologically Inspired Robotics (Biro-net) in AISB’06: Adaptation in Artificial and Biological Systems. vol. II, pp. 131–139 (2006)
Butz, M.V., Sigaud, O., Gérard, P.: Internal models and anticipations in adaptive learning systems. In: Butz, M.V., Sigaud, O., Gérard, P. (eds.) Anticipatory Behavior in Adaptive Learning Systems. LNCS (LNAI), vol. 2684, Springer-Verlag, Heidelberg (2003)
Cañamero, L., Blanchard, A., Nadel, J.: Attachment Bonds for Human-Like Robots. International Journal of Humanoid Robotics 3(3), 301–320 (2006)
Cos-Aguilera, I., Cañamero, L., Hayes, G.: Learning Object Functionalisites in the Context of Action Selection. In: Nehmzow, U., Melhuish, C. (eds.) Towards Intelligent Mobile Robots, TIMR 2003: 4th British Conference on Mobile Robotics. University of the West of England, Bristol, UK, August 28–29 (2003)
Cos-Aguilera, I., Cañamero, L., Hayes, G.: Motivation-Driven Learning of Action Affordances. In: Cañamero, L. (ed.) Agents that Want and Like: Motivational and Emotional Roots of Cognition and Action. Papers from the AISB 2005 Symposium. University of Hertfordshire, UK, April 14–15, pp. 33–36, AISB Press (2005)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, Chichester (1991)
Doya, K.: Reinforcement learning in continuous time and space. Neural Computation 12(1), 219–245 (2000)
Gaussier, P., Zrehen, S.: Perac: A neural architecture to control artificial animals. Robotics and Autonomous Systems 16, 291–320 (1995)
Oudeyer, P.Y., Kaplan, F.: Intelligent adaptive curiosity: a source of self-development. In: Berthouze, L., Kozima, H., Prince, C.G., Sandini, G., Stojanov, G., Metta, G., Balkenius, C. (eds.) Proc. of the 4th Intl. Wks. on Epigenetic Robotics. Lund University Cognitive Studies. vol. 117, pp. 127–130 ( 2004)
Prinz, W.: Perception and action planning. European journal of cognitive psychology 9(2), 129–154 (1997)
Rescorla, R., Wagner, A.: A.: A theory of pavlovian conditioning: Variations in effectiveness of reinforcement and nonreinforcement. In: Black, A., Prokasy, W. (eds.) Classical Conditioning II, pp. 64–99. Appleton-Century-Crofts, New York (1972)
Steels, L.: The autotelic principle. In: Iida, F., Pfeifer, R., Steels, L., Kuniyoshi, Y. (eds.) Embodied Artificial Intelligence. LNCS (LNAI), vol. 3139, pp. 231–242. Springer, Heidelberg (2004)
Sutton, R., Barto, A.: A temporal-difference model of classical conditioning. In: Proceedings of the Ninth Annual Conference of the Cognitive Science Society, 355–378 (1987)
Watkins, C.: Learning from Delayed Rewards. PhD thesis, King’s College (1989)
Wolpert, D., Macready, W.: No free lunch theorems for optimisation. IEEE Trans. on Evolutionary Computation 1, 67–82 (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Blanchard, A.J., Cañamero, L. (2007). Anticipating Rewards in Continuous Time and Space: A Case Study in Developmental Robotics. In: Butz, M.V., Sigaud, O., Pezzulo, G., Baldassarre, G. (eds) Anticipatory Behavior in Adaptive Learning Systems. ABiALS 2006. Lecture Notes in Computer Science(), vol 4520. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74262-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-74262-3_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74261-6
Online ISBN: 978-3-540-74262-3
eBook Packages: Computer ScienceComputer Science (R0)