A Note on Liao’s Recurrent Neural-Network Learning for Discrete Multi-stage Optimal Control Problems
- 22 Downloads
The roots of neural-network backpropagation (BP) may be traced back to classical optimal-control gradient procedures developed in early 1960s. Hence, BP can directly apply to a general discrete N-stage optimal control problem that consists of N stage costs plus a terminal state cost. In this journal (Liao in Neural Process Lett 10:195–200, 1999), given such a multi-stage optimal control problem, Liao has turned it into a problem involving a terminal state cost only (via classical transformation), and then claimed that BP on the transformed problem leads to new recurrent neural network learning. The purpose of this paper is three-fold: First, the classical terminal-cost transformation yields no particular benefit for BP. Second, two simulation examples (with and without time lag) demonstrated by Liao can be regarded naturally as deep feed-forward neural-network learning rather than as recurrent neural-network learning from the perspective of classical optimal-control gradient methods. Third, BP can readily deal with a general history-dependent optimal control problem (e.g., involving time-lagged state and control variables) owing to Dreyfus’s 1973 extension of BP. Throughout the paper, we highlight systematic BP derivations by employing the recurrence relation of nominal cost-to-go action-value functions based on the stage-wise concept of dynamic programming.
KeywordsBackpropagation Optimal control gradient methods Deep feed-forward neural-network learning
Eiji Mizutani would like to thank Stuart Dreyfus (UC Berkeley) for numerous invaluable discussions on neural network learning and dynamic programming for more than two decades. The work is partially supported by the Ministry of Science and Technology, Taiwan (Grant: 106-2221-E-011-146-MY2).
- 3.Bryson AE (1961) A gradient method for optimizing multi-stage allocation processes. In: Proceedings of Harvard University symposium on digital computers and their applications, pp 125–135Google Scholar
- 6.Dreyfus SE (1966) The numerical solution of non-linear optimal control problems. In: Greenspan D (ed) Numerical solutions of nonlinear differential equations: proceedings of an advanced symposium. Wiley, London, pp 97–113Google Scholar
- 13.Mizutani E, Dreyfus S, Nishio K (2000) On derivation of MLP backpropagation from the Kelley–Bryson optimal-control gradient formula and its application. In: Proceedings of the IEEE international conference on neural networks, Como, Italy (vol 2), pp 167–172 (see also http://queue.ieor.berkeley.edu/People/Faculty/dreyfus-pubs/hidteach.m)
- 14.Mizutani E, Dreyfus SE (2006) On derivation of stage-wise second-order backpropagation by invariant imbedding for multi-stage neural-network learning. In: Proceedings of the the IEEE World congress on computational intelligence, Vancouver, CANADA, pp 4762–4769Google Scholar
- 18.Parisini T, Zoppoli R (1991) Neural networks for the solution of \(N\)-stage optimal control problems. In: Kohonen T, Makisara K, Simula O, Kangas J (eds) Artif Neural Netw. Elsevier Science Publishers B.V., North-Holland, pp 333–338Google Scholar
- 19.Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL (eds) Parallel distributed processing, vol 1. MIT Press, Cambridge, pp 318–362Google Scholar
- 22.Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. MIT Press, CambridgeGoogle Scholar