Skip to main content

Reinforcement Learning in Nonstationary Environment Navigation Tasks

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4509))

Abstract

The field of reinforcement learning (RL) has achieved great strides in learning control knowledge from closed-loop interaction with environments. “Classical” RL, based on atomic state space representations, suffers from an inability to adapt to nonstationarities in the target Markov decision process (i.e., environment). Relational RL is widely seen as being a potential solution to this shortcoming. In this paper, we demonstrate a class of “pseudo-relational” learning methods for nonstationary navigational RL domains – domains in which the location of the goal, or even the structure of the environment, can change over time. Our approach is closely related to deictic representations, which have previously been found to be troublesome for RL. The key insight of this paper is that navigational problems are a highly constrained class of MDP, possessing a strong native topology that relaxes some of the partial observability difficulties arising from deixis. Agents can employ local information that is relevant to their near-term action choices to act effectively. We demonstrate that, unlike an atomic representation, our agents can learn to fluidly adapt to changing goal locations and environment structure.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Finney, S., Gardiol, N.H., Kaelbling, L.P., Oates, T.: The thing that we tried didn’t work very well: Deictic representation in reinforcement learning. In: UAI-2002 (2002)

    Google Scholar 

  2. McCallum, A.: Overcoming incomplete perception with utile distinction memory. In: ICML-93 (1993)

    Google Scholar 

  3. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. In: Optimization and neural computation series, Athena Scientific, Belmont (1996)

    Google Scholar 

  4. Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)

    Article  MathSciNet  Google Scholar 

  5. Mahadevan, S.: Proto-value functions: Developmental reinforcement learning. In: ICML-2005 (2005)

    Google Scholar 

  6. Džeroski, S., De Raedt, L., Driessens, K.: Relational reinforcement learning. Machine Learning 43(1–2), 7–52 (2001)

    Article  Google Scholar 

  7. van Otterlo, M.: A survey of reinforcement learning in relational domains. Technical Report TR-CTIT-05-31, University of Twente, Centre for Telematics and Information Technology (July 2005)

    Google Scholar 

  8. Fern, A., Yoon, S., Givan, R.: Approximate policy iteration with a policy language bias: Solving relational markov decision processes. Journal of Artificial Intelligence Research 25, 75–118 (2006)

    MathSciNet  Google Scholar 

  9. Dean, T., Givan, R.: Model minimization in Markov decision processes. In: AAAI-97, pp. 106–111 (1997)

    Google Scholar 

  10. Ravindran, B., Barto, A.G.: Relativized options: Choosing the right transformation. In: ICML-2003, pp. 608–615 (2003)

    Google Scholar 

  11. Ravindran, B.: An Algebraic Approach to Abstraction in Reinforcement Learning. PhD thesis, Department of Computer Science, University of Massachusetts, Amherst, MA (2004)

    Google Scholar 

  12. Tash, J., Russell, S.: Control strategies for a stochastic planner. In: AAAI-94 (1994)

    Google Scholar 

  13. Dean, T., Kaelbling, L.P., Kirman, J., Nicholson, A.: Planning under time constraints in stochastic domains. Artificial Intelligence 76 (1995)

    Google Scholar 

  14. Baum, J., Nicholson, A.E.: Dynamic non-uniform abstractions for approximate planning in large structured stochastic domains. Technical Report 1998/18, School of Computer Science and Software Engineering, Monash University, Melbourne (1998)

    Google Scholar 

  15. Glaubius, R., Smart, W.D.: Manifold representations for value-function approximation in reinforcement learning. Technical Report 05-19, Department of Computer Science and Engineering, Washington University in St. Louis (2005)

    Google Scholar 

  16. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, New York (1994)

    MATH  Google Scholar 

  17. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ziad Kobti Dan Wu

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Lane, T., Ridens, M., Stevens, S. (2007). Reinforcement Learning in Nonstationary Environment Navigation Tasks. In: Kobti, Z., Wu, D. (eds) Advances in Artificial Intelligence. Canadian AI 2007. Lecture Notes in Computer Science(), vol 4509. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72665-4_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72665-4_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72664-7

  • Online ISBN: 978-3-540-72665-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics