Skip to main content

On the significance of Markov decision processes

  • Part II: Cortical Maps and Receptive Fields
  • Conference paper
  • First Online:
Artificial Neural Networks — ICANN'97 (ICANN 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1327))

Included in the following conference series:

Abstract

Formulating the problem facing an intelligent agent as a Markov decision process (MDP) is increasingly common in artificial intelligence, reinforcement learning, artificial life, and artificial neural networks. In this short paper we examine some of the reasons for the appeal of this framework. Foremost among these are its generality, simplicity, and emphasis on goal-directed interaction between the agent and its environment. MDPs may be becoming a common focal point for different approaches to understanding the mind. Finally, we speculate that this focus may be an enduring one insofar as many of the efforts to extend the MDP framework end up bringing a wider class of problems back within it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Barto, A. G., Bradtke, S. J., and Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81–138.

    Google Scholar 

  • Barto, A. G., Sutton, R. S., and Watkins, C. J. C. H. (1990). Learning and sequential decision making. In Gabriel, M. and Moore, J., editors, Learning and Computational Neuroscience: Foundations of Adaptive Networks, pages 539–602. MIT Press, Cambridge, MA.

    Google Scholar 

  • Bellman, R. E. (1957). A Markov decision process.Journal of Mathematical Mech., 6:679–684.

    Google Scholar 

  • Boutilier, C., Dearden, R., and Goldszmidt, M. (1995). Exploiting structure in policy construction. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence.

    Google Scholar 

  • Crites, R. H. and Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, M. E. H. editor, Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pages 1017–1023, Cambridge, MA. MIT Press.

    Google Scholar 

  • Dean, T. L., Kaelbling, L. P., Kirman, J., and Nicholson, A. (1995).Planning under time constraints in stochastic domains. Artificial Intelligence, 76(12):35–74.

    Google Scholar 

  • Houk, J. C., Adams, J. L., and Barto, A. G. (1995).A model of how the basal ganglia generates and uses neural signals that predict reinforcement. In Houk, J. C., Davis, J. L., and Beiser, D. G., editors, Models of Information Processing in the Basal Ganglia, pages 249–270. MIT Press, Cambridge, MA.

    Google Scholar 

  • McCallum, A. K. (1995). Reinforcement Learning with Selective Perception and Hidden State. PhD thesis, University of Rochester, Rochester.

    Google Scholar 

  • Precup, D. and Sutton, R. S. (in preparation). Multi-time models for temporally abstract planning.

    Google Scholar 

  • Santamaria, J. C., Sutton, R. S., and Ram, A. (1996). Experiments with reinforcement learning in problems with continuous state and action spaces. Technical Report UM-CS-1996-088, Department of Computer Science, University of Massachusetts, Amherst, MA 01003.

    Google Scholar 

  • Schultz, W., Dayan, P., and Montague, P. R. (1997).A neural substrate of prediction and reward. Science, 275:1593–1598.

    Google Scholar 

  • Singh, S. P. (1992). Transfer of learning by composing solutions of elemental sequential tasks. Machine Learning, 8:323–339.

    Google Scholar 

  • Singh, S. P., Jaakkola, T., and Jordan, M. I. (1994). Learning without stateestimation in partially observable Markovian decision problems. In Cohen, W. W. and Hirsch, H., editors, Proceedings of the Eleventh International Conference on Machine Learning, pages 284–292, San Francisco, CA. Morgan Kaufmann.

    Google Scholar 

  • Singh, S. P., Jaakkola, T., and Jordan, M. I. (1995). Reinforcement learing with soft state aggregation. In G. Tesauro, D. Touretzky, T. L., editor, Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference, pages 359–368, Cambridge, MA. MIT Press.

    Google Scholar 

  • Sutton, R. S. (1995). TD models: Modeling the world at a mixture of time scales. In Prieditis, A. and Russell, S., editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 531–539, San Francisco, CA. Morgan Kaufmann.

    Google Scholar 

  • Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Touretzky, D. S., Mozer, M. C., and Hasselmo, M. E., editors, Advances in Neural Information Processing Sys tems: Proceedings of the 1995 Conference, pages 1038–1044, Cambridge, MA. MIT Press.

    Google Scholar 

  • Sutton, R. S. and Barto, A. G. (1990). Time-derivative models of Pavlovian reinforcement. In Gabriel, M. and Moore, J., editors, Learning and Computational Neuroscience: Foundations of Adaptive Networks, pages 497–537. MIT Press, Cambridge, MA.

    Google Scholar 

  • Sutton, R. S. and Barto, A. G. (1998). Introduction to Reinforcement Learning. MIT Press/Bradford Books, Cambridge, MA.

    Google Scholar 

  • Tesauro, G. J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38:58–68.

    Google Scholar 

  • Tesauro, G. J. and Galperin, G. R. (1997). On-line policy improvement using monte-carlo search. In Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference, Cambridge, MA. MIT Press.

    Google Scholar 

  • Van Roy, B., Bertsekas, D. P., Lee, Y., and Tsitsiklis, J. N. (1996). A neurodynamic programming approach to retailer inventory management. Technical ] Report LIDS-P-?, Laboratory for Information and Decision Systems, Massachusetts Institute of Technology.

    Google Scholar 

  • Watkins, C. J. C. H. (1989). Learning from Delayed Rewards.PhD thesis, Cambridge University, Cambridge, England.

    Google Scholar 

  • Witten, I. H. (1977). Exploring, modelling and controlling discrete sequential environments. International Journal of Man-Machine Studies, 9:715–735.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Wulfram Gerstner Alain Germond Martin Hasler Jean-Daniel Nicoud

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sutton, R.S. (1997). On the significance of Markov decision processes. In: Gerstner, W., Germond, A., Hasler, M., Nicoud, JD. (eds) Artificial Neural Networks — ICANN'97. ICANN 1997. Lecture Notes in Computer Science, vol 1327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0020167

Download citation

  • DOI: https://doi.org/10.1007/BFb0020167

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63631-1

  • Online ISBN: 978-3-540-69620-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics