# A Note on Liao’s Recurrent Neural-Network Learning for Discrete Multi-stage Optimal Control Problems

- 22 Downloads

## Abstract

The roots of neural-network backpropagation (BP) may be traced back to classical optimal-control gradient procedures developed in early 1960s. Hence, BP can directly apply to a general discrete *N*-stage optimal control problem that consists of *N* stage costs plus a terminal state cost. In this journal (Liao in Neural Process Lett 10:195–200, 1999), given such a multi-stage optimal control problem, Liao has turned it into a problem involving a terminal state cost only (via classical transformation), and then claimed that BP on the transformed problem leads to *new recurrent neural network* learning. The purpose of this paper is three-fold: First, the classical terminal-cost transformation yields no particular benefit for BP. Second, two simulation examples (with and without time lag) demonstrated by Liao can be regarded naturally as deep feed-forward neural-network learning rather than as recurrent neural-network learning from the perspective of classical optimal-control gradient methods. Third, BP can readily deal with a general history-dependent optimal control problem (e.g., involving *time-lagged state and control variables*) owing to Dreyfus’s 1973 extension of BP. Throughout the paper, we highlight systematic BP derivations by employing the recurrence relation of nominal cost-to-go action-value functions based on the *stage-wise concept of dynamic programming*.

## Keywords

Backpropagation Optimal control gradient methods Deep feed-forward neural-network learning## Notes

### Acknowledgements

Eiji Mizutani would like to thank Stuart Dreyfus (UC Berkeley) for numerous invaluable discussions on neural network learning and dynamic programming for more than two decades. The work is partially supported by the Ministry of Science and Technology, Taiwan (Grant: 106-2221-E-011-146-MY2).

## References

- 1.Bellman RE, Dreyfus SE (1962) Applied dynamic programming. Princeton University Press, Princeton, pp 348–353CrossRefzbMATHGoogle Scholar
- 2.Bliss GA (1946) Lectures on the calculus of variations, Chapter VII. The University of Chicago Press, ChicagozbMATHGoogle Scholar
- 3.Bryson AE (1961) A gradient method for optimizing multi-stage allocation processes. In: Proceedings of Harvard University symposium on digital computers and their applications, pp 125–135Google Scholar
- 4.Demmel JW (1997) Applied numerical linear algebra. SIAM, PhiladelphiaCrossRefzbMATHGoogle Scholar
- 5.Dreyfus SE (1962) The numerical solution of variational problems. J Math Anal Appl 5(1):30–45MathSciNetCrossRefzbMATHGoogle Scholar
- 6.Dreyfus SE (1966) The numerical solution of non-linear optimal control problems. In: Greenspan D (ed) Numerical solutions of nonlinear differential equations: proceedings of an advanced symposium. Wiley, London, pp 97–113Google Scholar
- 7.Dreyfus SE (1973) The computational solution of optimal control problems with time lag. IEEE Trans Automat Control 18(4):383–385MathSciNetCrossRefGoogle Scholar
- 8.Dreyfus S, Law A (1977) The art and theory of dynamic programming. Academic Press, London, pp 103–105zbMATHGoogle Scholar
- 9.Dreyfus SE (1990) Artificial neural networks, back propagation, and the Kelley–Bryson gradient procedure. J Guid Control Dyn 13(5):926–928MathSciNetCrossRefGoogle Scholar
- 10.Kelley HJ (1960) Gradient theory of optimal flight paths. Am Rocket Soc J 30(10):941–954zbMATHGoogle Scholar
- 11.Liao LZ (1999) A recurrent neural network for \(N\)-stage optimal control problems. Neural Process Lett 10:195–200CrossRefGoogle Scholar
- 12.Liao LZ, Shoemaker CA (1999) Convergence in unconstrained discrete-time differential dynamic programming. IEEE Trans Autom Control 36:692–706MathSciNetCrossRefzbMATHGoogle Scholar
- 13.Mizutani E, Dreyfus S, Nishio K (2000) On derivation of MLP backpropagation from the Kelley–Bryson optimal-control gradient formula and its application. In: Proceedings of the IEEE international conference on neural networks, Como, Italy (vol 2), pp 167–172 (see also http://queue.ieor.berkeley.edu/People/Faculty/dreyfus-pubs/hidteach.m)
- 14.Mizutani E, Dreyfus SE (2006) On derivation of stage-wise second-order backpropagation by invariant imbedding for multi-stage neural-network learning. In: Proceedings of the the IEEE World congress on computational intelligence, Vancouver, CANADA, pp 4762–4769Google Scholar
- 15.Mizutani E (2015) On Pantoja’s problem allegedly showing a distinction between differential dynamic programming and stage-wise Newton methods. Int J Control 8(9):1702–1722CrossRefzbMATHGoogle Scholar
- 16.Mizutani E, Dreyfus SE (2017) Totally model-free actor-critic recurrent neural-network reinforcement learning in non-Markovian domains. Ann Oper Res 258(1):107–131MathSciNetCrossRefzbMATHGoogle Scholar
- 17.Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn. Springer, LondonzbMATHGoogle Scholar
- 18.Parisini T, Zoppoli R (1991) Neural networks for the solution of \(N\)-stage optimal control problems. In: Kohonen T, Makisara K, Simula O, Kangas J (eds) Artif Neural Netw. Elsevier Science Publishers B.V., North-Holland, pp 333–338Google Scholar
- 19.Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL (eds) Parallel distributed processing, vol 1. MIT Press, Cambridge, pp 318–362Google Scholar
- 20.Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117CrossRefGoogle Scholar
- 21.Sabouri KJ, Effati S, Pakdaman M (2017) A neural network approach for solving a class of fractional optimal control problems. Neural Process Lett 45:59–74CrossRefGoogle Scholar
- 22.Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT Press, CambridgeGoogle Scholar
- 23.Werbos PJ (1988) Generalization of backpropagation with application to a recurrent gas market model. Neural Netw 1(4):339–356CrossRefGoogle Scholar
- 24.Wilamowski BM, Yu H (2010) Neural network learning without backpropagation. IEEE Trans Neural Netw 21(11):1793–1803CrossRefGoogle Scholar