Neural Reinforcement Learning for Robot Navigation

  • S. Sehad
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 21)


Reinforcement Learning (RL) is an attractive approach for robot learning since it allows an agent to learn a given behavior from an evaluation of the wanted behavior. This measure correspond to a qualitative evaluation of the agent behavior. An agent means here a simulated system in a virtual world or a real agent interacting with a real world. The agent perceives situations of the world with its sensors and acts in the world with its motors. Figure 1 illustrates what we mean by an agent in this chapter. Experiments in learning an obstacle-avoidance behavior of the robot Khepera are presented. It is shown that neural RL is more suitable in real world applications.


Reinforcement Learn Soft Computing Robot Navigation Heuristic Function Goal Position 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References and Further Reading

  1. [1]
    Anderson, C.W. (1988), “Learning to Control an Inverted Pendulum Using Neural Networks,” American Control Conference, Atlanta, Georgia, USA.Google Scholar
  2. [2]
    Barto, A.G., Sutton, R.S., and Anderson, C.W. (1983), “Neuronlike Elements That Can Solve Difficult Learning Control Problems”, IEEE Transactions on Systems, Man and Cybernetics, (13), pp. 834–846.Google Scholar
  3. [3]
    Barto, A.G. (1992), “Reinforcement Learning and Adaptive Critic Methods,” Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Edited by D.A. White and D.A. Sofge, pp. 469–491.Google Scholar
  4. [4]
    Dorigo, M. and Collombetti, M. (1994), “Robot Shaping: Developing Autonomous Agents through Learning,” Artificial Intelligence, Vol. 71, (22), pp. 321–370.CrossRefGoogle Scholar
  5. [5]
    Fritzke, B. (1993), “Growing Cell Structures - a Self-organizing Network for Unsupervised and Supervised Learning,” Technical Report, TR-93–026, ICSIInternational Computer Science Institute, Berkley, California, USA.Google Scholar
  6. [6]
    Gullapalli, V. (1994), Reinforcement Learning and Its Application to Control, Ph.D. Thesis, University of Massachusetts.Google Scholar
  7. [7]
    Hertz, J., Krogh, A., and Palmer, R.G. (1991), Introduction to Theory of Neural Computation,The Advanced Book Program.Google Scholar
  8. [8]
    Kaelbling, L. (1993), Learning in Embedded Systems,MIT Press.Google Scholar
  9. [9]
    Kohonen, T. (1988), Self-Organisation and Associative Memory,Springer-Verlag.Google Scholar
  10. [10]
    Lin, L-J. (1992), “Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching,” Machine Learning, (8), pp. 293–321.Google Scholar
  11. [11]
    Lin, L-J (1993), Reinforcement Learning for Robots Using Neural Networks, Ph.D. Thesis, School of Computer Science, Carnegie Mellon University, Pittsburg, USA.Google Scholar
  12. [12]
    Maes, P. and Brooks, R. (1990), “Learning to Coordinate Behavior,” Proceedings of the Eighth National Conference on Artificial Intelligence, Morgan Kaufmann, San Mateo, CA, pp. 796–802.Google Scholar
  13. [13]
    Mahadevan, S. and Connell, J. (1991), “Scaling Reinforcement Learning to Robotics by Exploiting the Subsumption Architecture,” Proceedings of the Eighth International Workshop on Machine Learning, Evanston, Illinois, Morgan Kaufmann, pp. 328–332.Google Scholar
  14. [ 14]
    Mataric, M.J. (1994), Interaction and Intelligent Behavior, Ph.D. Thesis, Massachusetts Institute of Technology.Google Scholar
  15. [15]
    Millàn, J. and Torras, C. (1992), “A Reinforcement Connectionist Approach to Robot Path Finding in Non-Maze-Like Environments,” Machine Learning, 8 (3/4), pp. 363–395.Google Scholar
  16. [16]
    Mondada, F., Franzi, E., and Ienne, P. (1993), “Mobile Robot Miniaturisation: A Tool for Investigation in Control Algorithms,” Proceedings.of the Third International Symposium on Experimental Robotics, Kyoto, Japan.Google Scholar
  17. [17]
    Ritter, H.J., Martinetz, M., and Schulten, K.J. (1989), “Topology Conserving Maps for Learning Visuo-Motor Coordination,” Neural Networks, 2, pp.159168.Google Scholar
  18. [18]
    Ritter, H.J., Martinetz, M., and Schulten, K.J. (1992), Neural Computation and Self-organizing Maps,Addison-Wesley Publishing Company.Google Scholar
  19. [19]
    Rumelhart, D.E., Hinton, G.E., and Williams, R.J. (1986), “Learning Internal Representations by Error Propagation,” Parallel Distributed Processing, Cambridge, Massachusetts, MIT Press Edition, Vol 1, Chapter 8.Google Scholar
  20. [20]
    Sourouchyari, M. (1989), “Mobile Robot Navigation: A neural network approach,” Group CARNAC edition, EPFL-UNIL Lausanne.Google Scholar
  21. [21]
    Sehad, S. and Touzet, C. (1994), “Self-organizing Map for Reinforcement Learning: Obstacle-Avoidance with Khepera,” From Perception To Action, PERAC’94, Lausanne Suisse, September, IEEE Computer Society Press.Google Scholar
  22. [22]
    Sehad, S. and Touzet, C. (1995), “Neural Reinforcement Path Planning for the Miniature Robot Khepera,” Proceedings of the World Congress on Neural Networks, Vol 2, pp. 350–354, Washington D.C., USA, INNS Press.Google Scholar
  23. [23]
    Sutton, R. S. (1984), Temporel Credit Assignment in Reinforcement Learning, Ph.D. Thesis, Department of Computer and Information Science, University of Massachusetts.Google Scholar
  24. [24]
    Sutton, R.S. (1990), “Integrated Architectures for Learning, Planning, and Reacting based on Approximating Dynamic Programming,” Proceedings of the Seventh International Conference on Machine Learning, pp. 216–224.Google Scholar
  25. [25]
    Sutton, R.S. (1992), “Reinforcement Learning Architectures for Animats,” Proceedings of the First International Conference on Simulation of Adaptive Behaviour, From Animals to Animats, Edited by J-A Meyer and S. W. Wilson, pp. 288–296.Google Scholar
  26. [26]
    Watkins, C.J.C.H. (1989), Learning from Delayed Rewards, Ph.D. Thesis, King’s College, Cambridge.Google Scholar
  27. [27]
    Watkins, C.J.C.H. and Dayan, P. (1992), “Technical Note: Q-learning,” Machine Learning, (8), pp. 279–292.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • S. Sehad
    • 1
  1. 1.GRIBUniversity of Pierre and Marie CurieParisFrance

Personalised recommendations