Advertisement

Adaptive Look-Ahead Planning

  • Sebastian Thrun
  • Knut Möller
  • Alexander Linden
Conference paper
Part of the Informatik-Fachberichte book series (INFORMATIK, volume 252)

Abstract

We present a new adaptive connectionist planning method. By interaction with an environment a world model is progressively constructed using the backpropagation learning algorithm. The planner constructs a look-ahead plan by iteratively using this model to predict future reinforcements. Future reinforcement is maximized to derive suboptimal plans, thus determining good actions directly from the knowledge of the model network (strategic level). This is done by gradient descent in action space.

The problem of finding good initial plans is solved by the use of an “experience” network (intuition level). The appropriateness of this planning method for finding suboptimal actions in unknown environments is demonstrated with a target tracking problem.

Keywords

Planning reinforcement learning temporal credit assignment problem gradient descent target tracking 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    C. W. Anderson. Learning and problem solving with multilayer connectionist systems. Technical Report COINS TR 86–50, Dept. of Computer and Information Science, University of Massachusetts, Amherst, MA, 1986.Google Scholar
  2. [2]
    A. G. Barto. Connectionist learning for control: An overview. Technical Report COINS TR 89–89, Dept. of Computer and Information Science, University of Massachusetts, Amherst, MA, September 1989.Google Scholar
  3. [3]
    J. L. Elman. Finding structure in time. Technical Report CRL Technical Report 8801, Center for Research in Language, University of California, San Diego, 1988.Google Scholar
  4. [4]
    M. Gherrity. A learning algorithm for analog, fully recurrent neural networks. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC,San Diego, 1989. IEEE, IEEE TAB Neural Network Committee.Google Scholar
  5. [5]
    M. I. Jordan. Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedings of the Conference on Cognitive Science, 1986.Google Scholar
  6. [6]
    M. I. Jordan. Generic constraints on unspecified target constraints. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC,San Diego, 1989. IEEE, IEEE TAB Neural Network Committee.Google Scholar
  7. [7]
    J. Kindermann and A. Linden. Inversion of neural nets. Journal of Parallel Computing,1990. (to appear).Google Scholar
  8. [8]
    A. Linden and J. Kindermann. Inversion of multilayer nets. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC,San Diego, 1989. IEEE.Google Scholar
  9. [9]
    M. C. Mozer. A focused backpropagation algorithm for temporal pattern recognition. Technical Report CRGTR-88–3, Depts. of Psychology and Computer Science, University of Toronto, Toronto, Jun 1988.Google Scholar
  10. [10]
    P. Munro. A dual backpropagation scheme for scalar-reward learning. In Ninth Annual Conference of the Cognitive Science Society,pages 165–176, Hillsdale, NJ, 1987. Cognitive Science Society, Lawrence Erlbaum.Google Scholar
  11. [11]
    D. Nguyen and B. Widrow. The truck backer-upper: An example of self-learning in neural networks. In Proceedings of the First International Joint Conference on Neural Networks, Washington, DC, San Diego, 1989. IEEE, IEEE TAB Neural Network Committee.Google Scholar
  12. [12]
    N. J. Nilsson. Principles of Artificial Intelligence. Springer Verlag, Berlin, 1982.zbMATHGoogle Scholar
  13. [13]
    B. A. Pearlmutter. Learning state space trajectories in recurrent neural networks. Technical Report CMU-CS88–191, Carnegie Mellon University, 1988.Google Scholar
  14. [14]
    A. J. Robinson. Dynamic Error Propagation Networks. PhD thesis, Cambridge University Engineering Dept., Cambridge, UK, February 1989.Google Scholar
  15. [15]
    A. J. Robinson and F. Fallside. Dynamic reinforcement driven error propagation networks with application to game playing. to be presented at the Eleventh Annual Conference of the Cognitive Science Society, Ann Arbor, 1989.Google Scholar
  16. [16]
    D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing. Vol. I + II. MIT Press, 1986.Google Scholar
  17. [17]
    R. S. Sutton. Temporal Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts, 1984.Google Scholar
  18. [18]
    S. Thrun. A general feed-forward algorithm for gradient-descent in neural networks. Technical Report In press, GMD, Sankt Augustin, FRG, 1990.Google Scholar
  19. [19]
    S. Thrun and A. Linden. Inversion in time. In Proceedings of the EURASIP Workshop on Neural Networks, Sesimbra, Portugal, February 15–17. EURASIP, 1990.Google Scholar
  20. [20]
    P. J. Werbos. Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems, Man, and Cybernetics, SMC 17: 7–19, 1987.CrossRefGoogle Scholar
  21. [21]
    R. J. Williams and D. Zipser. A learning algorithm for continually running fully recurrent neural networks. Technical Report ICS Report 8805, Institute for Cognitive Science, University of California, San Diego, CA, 1988.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1990

Authors and Affiliations

  • Sebastian Thrun
    • 1
    • 2
  • Knut Möller
    • 2
  • Alexander Linden
    • 1
  1. 1.Center for Computer ScienceGerman National ResearchSt. AugustinGermany
  2. 2.Department of Computer ScienceUniversity of BonnBonnGermany

Personalised recommendations