Skip to main content

Temporal Difference Learning: A Chemical Process Control Application

  • Chapter
Applications of Neural Networks
  • 446 Accesses

Abstract

Learning to control can be considered a trial-and-error process in which the controlling agent explores the consequences of various actions. Actions that produce good results become reinforced while those that produce bad results are suppressed. Eventually, the best control actions become dominant over all others, resulting in an optimal solution to the control problem. Central to this approach is the existence of an appropriate performance measure, or reinforcement function that can distinguish good from bad consequences among possible control actions. Often, the control objective is specified as an operating setpoint, suggesting a simple reinforcement function based on distance to setpoint. Actions that result in states closer to the setpoint are assigned relatively higher reinforcement values, while those that result in states further from the setpoint are assigned relatively lower reinforcement values. Control of dynamical systems is complicated by time lags between control actions and their eventual consequences. In such systems, it is sometimes undesirable to move too rapidly toward the setpoint. Because of time lags between actions and consequences, it may be impossible to slow down in time once the controlled variable is moving rapidly toward the setpoint. The result is to overshoot. A controller that relies on a reinforcement function based only on distance to setpoint may never learn to control at all. Rather, it will approach the setpoint from one side, overshoot, approach from the other side, overshoot again, and thus forever oscillate. The problem is that the reinforcement function considers only local, short-term consequences of the controller’s actions, but we really want the controller to choose actions based on their long-term consequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, P., Lee, C., Lim, H. C., & Ramkrishna, D. (1982). Theoretical investigations of dynamic behavior of isothermal continuous stirred biological reactors. Chemical Engineering Science, 37, 453.

    Article  Google Scholar 

  2. Barto, A. G. (1990). Connectionist learning for control: an overview. In: W. T. Miller, R. S. Sutton, & P. J. Werbos (Eds.) Neural Networks for Control. Cambridge, MA: MIT Press.

    Google Scholar 

  3. Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 835–846.

    Article  Google Scholar 

  4. Brody, C. (1992). Fast learning with predictive forward models. In: J. E. Moody, S. J. Hanson, & R. P. Lippmann (Eds.) Advances in Neural Information Processing Systems 4, San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  5. Jordan, M. I. & Jacobs, R. A. (1990). Learning to control an unstable system with forward modeling. In: D. S. Touretzky (Ed.) Advances in Neural Information Processing Systems 2, 324–331. Cambridge, MA: MIT Press.

    Google Scholar 

  6. Lin, Long-Ji. (1991). Self-improving reactive agents: Case studies of reinforcement learning frameworks. Proceedings of the International Conference on the Simulation of Adaptive Behavior, MIT Press.

    Google Scholar 

  7. Moody, J. & Darken, C. J. (1989). Fast learning in networks of locally tuned processing units. Neural Computation, 1, 281–294.

    Article  Google Scholar 

  8. Munro, P. (1987) A dual back-propagation scheme for scalar reward learning Proceedings of the Ninth Annual Conference of the Cognitive Science Society, Seattle, WA, 165-176.

    Google Scholar 

  9. Narendra, K. S., & Parthasarathy, K. (1990). Identification and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks, 1, 4–27.

    Article  Google Scholar 

  10. Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.

    Google Scholar 

  11. Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning, 216-224.

    Google Scholar 

  12. Sutton, R. S., Barto, A. G., & Williams, R. J. (1991). Reinforcement learning is direct adaptive optimal control. Proceedings of the American Control Conference, June 26–28, Boston, MA, 2143-2146.

    Google Scholar 

  13. Ungar, L. H. (1990). A bioreactor benchmark for adaptive network-based process control. In: W. T. Miller, R. S. Sutton, & P. J. Werbos (Eds.) Neural Networks for Control Cambridge, MA: MIT Press.

    Google Scholar 

  14. Watkins, C. J. C. H. (1989). Learning from delayed rewards. Ph.D. Dissertation, Cambridge University, Cambridge, England.

    Google Scholar 

  15. Werbos, P. J. (1987). Building and understanding adaptive systems: A s-tatistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems, Man, and Cybernetics, 17, 7–20.

    Article  Google Scholar 

  16. Williams, R. J. (1986). Inverting a connectionist network mapping by back-propagation of error. Proceedings of the Eighth Annual Conference of the Cognitive Science Society, Amherst, MA, 859-865.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer Science+Business Media New York

About this chapter

Cite this chapter

Miller, S., Williams, R.J. (1995). Temporal Difference Learning: A Chemical Process Control Application. In: Murray, A.F. (eds) Applications of Neural Networks. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-2379-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-2379-3_12

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-5140-3

  • Online ISBN: 978-1-4757-2379-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics