Skip to main content

Anticipating Rewards in Continuous Time and Space: A Case Study in Developmental Robotics

  • Conference paper
  • 914 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4520))

Abstract

This paper presents the first basic principles, implementation and experimental results of what could be regarded as a new approach to reinforcement learning, where agents—physical robots interacting with objects and other agents in the real world—can learn to anticipate rewards using their sensory inputs. Our approach does not need discretization, notion of events, or classification, and instead of learning rewards for the different possible actions of an agent in all the situations, we propose to make agents learn only the main situations worth avoiding and reaching. However, the main focus of our work is not reinforcement learning as such, but modeling cognitive development on a small autonomous robot interacting with an “adult” caretaker, typically a human, in the real world; the control architecture follows a Perception-Action approach incorporating a basic homeostatic principle. This interaction occurs in very close proximity, uses very coarse and limited sensory-motor capabilities, and affects the “well-being” and affective state of the robot. The type of anticipatory behavior we are concerned with in this context relates to both sensory and reward anticipation. We have applied and tested our model on a real robot.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arkin, R.: Behavior-Based Robotics. The MIT Press, Cambridge (1998)

    Google Scholar 

  2. Avila-Garcia, O., Cañamero, L.: Using hormonal feedback to modulate action selection in a competitive scenario. In: Schaal, S., Ijspeert, J., Billard, A., Vijayakumar, S., Hallam, J., Meyer, J.A. (eds.) From Animals to Animats 8. Proceedings of the 8th International Conference on Simulation of Adaptive Behavior, pp. 243–252. The MIT Press, Cambridge, MA (2004)

    Google Scholar 

  3. Bateson, P.: What must be know in order to understand imprinting? In: Heyes, C., Huber, L. (eds.) The Evolution of Cognition, pp. 85–102. The MIT Press, Cambridge, MA (2000)

    Google Scholar 

  4. Blanchard, A., Cañamero, L.: From imprinting to adaptation: Building a history of affective interaction. In: Proc. of the 5th Intl. Wksp. on Epigenetic Robotics, pp. 23–30 (2005)

    Google Scholar 

  5. Blanchard, A.: and Cañamero, L.: Developing Affect-Modulated Behaviors: Stability, Exploration, Exploitation, or Imitation?. In: Kaplan, F., et al. (eds.) In: Proc. Sixth Intl. Conf. on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, Lund University Cognitive Studies 128, 17–24 (2006)

    Google Scholar 

  6. Blanchard, A., Cañamero, L.: Modulation of exploratory behavior for adaptation to the context. In: Kovacs, T.J.M.(ed.) Biologically Inspired Robotics (Biro-net) in AISB’06: Adaptation in Artificial and Biological Systems. vol. II, pp. 131–139 (2006)

    Google Scholar 

  7. Butz, M.V., Sigaud, O., Gérard, P.: Internal models and anticipations in adaptive learning systems. In: Butz, M.V., Sigaud, O., Gérard, P. (eds.) Anticipatory Behavior in Adaptive Learning Systems. LNCS (LNAI), vol. 2684, Springer-Verlag, Heidelberg (2003)

    Google Scholar 

  8. Cañamero, L., Blanchard, A., Nadel, J.: Attachment Bonds for Human-Like Robots. International Journal of Humanoid Robotics  3(3), 301–320 (2006)

    Article  Google Scholar 

  9. Cos-Aguilera, I., Cañamero, L., Hayes, G.: Learning Object Functionalisites in the Context of Action Selection. In: Nehmzow, U., Melhuish, C. (eds.) Towards Intelligent Mobile Robots, TIMR 2003: 4th British Conference on Mobile Robotics. University of the West of England, Bristol, UK, August 28–29 (2003)

    Google Scholar 

  10. Cos-Aguilera, I., Cañamero, L., Hayes, G.: Motivation-Driven Learning of Action Affordances. In: Cañamero, L. (ed.) Agents that Want and Like: Motivational and Emotional Roots of Cognition and Action. Papers from the AISB 2005 Symposium. University of Hertfordshire, UK, April 14–15, pp. 33–36, AISB Press (2005)

    Google Scholar 

  11. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, Chichester (1991)

    MATH  Google Scholar 

  12. Doya, K.: Reinforcement learning in continuous time and space. Neural Computation 12(1), 219–245 (2000)

    Article  Google Scholar 

  13. Gaussier, P., Zrehen, S.: Perac: A neural architecture to control artificial animals. Robotics and Autonomous Systems 16, 291–320 (1995)

    Article  Google Scholar 

  14. Oudeyer, P.Y., Kaplan, F.: Intelligent adaptive curiosity: a source of self-development. In: Berthouze, L., Kozima, H., Prince, C.G., Sandini, G., Stojanov, G., Metta, G., Balkenius, C. (eds.) Proc. of the 4th Intl. Wks. on Epigenetic Robotics. Lund University Cognitive Studies. vol. 117, pp. 127–130 ( 2004)

    Google Scholar 

  15. Prinz, W.: Perception and action planning. European journal of cognitive psychology 9(2), 129–154 (1997)

    Article  Google Scholar 

  16. Rescorla, R., Wagner, A.: A.: A theory of pavlovian conditioning: Variations in effectiveness of reinforcement and nonreinforcement. In: Black, A., Prokasy, W. (eds.) Classical Conditioning II, pp. 64–99. Appleton-Century-Crofts, New York (1972)

    Google Scholar 

  17. Steels, L.: The autotelic principle. In: Iida, F., Pfeifer, R., Steels, L., Kuniyoshi, Y. (eds.) Embodied Artificial Intelligence. LNCS (LNAI), vol. 3139, pp. 231–242. Springer, Heidelberg (2004)

    Google Scholar 

  18. Sutton, R., Barto, A.: A temporal-difference model of classical conditioning. In: Proceedings of the Ninth Annual Conference of the Cognitive Science Society, 355–378 (1987)

    Google Scholar 

  19. Watkins, C.: Learning from Delayed Rewards. PhD thesis, King’s College (1989)

    Google Scholar 

  20. Wolpert, D., Macready, W.: No free lunch theorems for optimisation. IEEE Trans. on Evolutionary Computation 1, 67–82 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Martin V. Butz Olivier Sigaud Giovanni Pezzulo Gianluca Baldassarre

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Blanchard, A.J., Cañamero, L. (2007). Anticipating Rewards in Continuous Time and Space: A Case Study in Developmental Robotics. In: Butz, M.V., Sigaud, O., Pezzulo, G., Baldassarre, G. (eds) Anticipatory Behavior in Adaptive Learning Systems. ABiALS 2006. Lecture Notes in Computer Science(), vol 4520. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74262-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74262-3_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74261-6

  • Online ISBN: 978-3-540-74262-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics