Developing Rational Agents

  • Marco Colombetti
  • Pier Luca Lanzi


An agent is a system that interacts with an environment continually and without human assistance in order to carry out a predefined task. We are interested in developing artificial agents that act rationally, in the sense that they are able to maximize a suitable utility function. In this chapter, we describe the main problems underlying the realization of rational agents and present commonly adopted mathematical models. In particular, we consider the case in which the environment can be modeled as a finite state stochastic process and address the problem of developing agents that can learn to act rationally through their own experience.


Mobile Robot Rational Agent Optimal Policy Reinforcement Learning Artificial Agent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    M. Wooldridge, Reasoning about rational agents, MIT Press, Cambridge, MA (2000).MATHGoogle Scholar
  2. 2.
    R. Bellman, Dynamic programming, Princeton University Press, Princeton, NJ (1957).MATHGoogle Scholar
  3. 3.
    A.R. Howard, Dynamic programming and Markov processes, The Technology Press of MIT, Cambridge, MA(1960).MATHGoogle Scholar
  4. A.R. Howard,Dynamic programming and Markov processes,John Wiley & Sons, New York (1960).MATHGoogle Scholar
  5. 4.
    D.P. Bertsekas, Dynamic programming and optimal control, Athena Scientific, Belmont, MA (1995).MATHGoogle Scholar
  6. 5.
    M. Putermann, Markov decision processes: Discrete dynamic stochastic programming, John Wiley, New York (1994).CrossRefGoogle Scholar
  7. 6.
    L.P. Kaelbling, L.M. Littman, and A.W. Moore, Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285 (1996).Google Scholar
  8. 7.
    R.S. Sutton and A.G. Barto, Reinforcement leaning: An introduction, MIT Press, Cambridge, MA (1998).Google Scholar
  9. 8.
    C.J.C.H. Watkins, Learning with delayed rewards, Ph.D. dissertation, University of Cambridge, Cambridge, UK (1989).Google Scholar
  10. 9.
    C.J.C.H. Watkins and P. Dayan, Technical note: Q-learning, Machine Learning, 8, 279–292 (1992).MATHGoogle Scholar
  11. 10.
    F.J. Sondik, The optimal control of partially observable Markov processes over the infinite horizon: discounted case, Operations Research, 26, 282–304 (1978).MathSciNetMATHCrossRefGoogle Scholar
  12. 11.
    S.P. Singh, T. Jaakkola and M.L. Jordan, Learning without state-estimation in partially observable Markovian decision processes, in W.W. Cohen and H. Hirsch eds., Proc. 11th Int. Conf on Machine Learning, Morgan Kaufman, San Francisco, CA, 284–292 (1994).Google Scholar
  13. 12.
    L.P. Kaelbling, L.M. Littman, and A. Cassandra, Planning and acting in partially observable stochastic domains, Artificial Intelligence, 101, 99–134 (1998).MathSciNetMATHCrossRefGoogle Scholar
  14. 13.
    P.L. Lanzi, Adaptive agents with reinforcement learning and internal memory, Proc. 6th International Conference on the Simulation of Adaptive Behavior (SAB 2000), Paris, F(2000).Google Scholar
  15. 14.
    P.L. Lanzi and S.W. Wilson, Toward optimal classifier system performance in non-Markov environments, Evolutionary Computation, 8 (4), 393–418 (2000).CrossRefGoogle Scholar
  16. 15.
    A. Bonarini and M. Matteucci, Learning context motivation in coordinated behaviors, Proc. 6th Intelligent Autonomous Systems Conference (IAS-6), IOS Press, Amsterdam, NL, 519–526 (2000).Google Scholar

Copyright information

© Springer Science+Business Media New York 2001

Authors and Affiliations

  • Marco Colombetti
    • 1
  • Pier Luca Lanzi
    • 1
  1. 1.Department of Electronics and InformationPolitecnico di MilanoMilanoItaly

Personalised recommendations