Fast Gradient Based Off-Line Training of Multilayer Perceptrons
Fast off-line training of Multilayer Perceptrons (MLPs) using gradient based algorithms is discussed. Simple Back Propagation and Batch Back Propagation, follow by viewing training as an unconstrained optimization problem. The inefficiencies of these methods are demonstrated with the aid of a number of test problems and used to justify the investigation of more powerful, second-order optimization techniques such as Conjugate Gradient (CG), Full Memory BFGS (FM) and Limited Memory BFGS (LM). Training is then at least an order of magnitude faster than with standard BBP, with the FM algorithm proving to be vastly superior to the others giving speed-ups of between 100 and 1000, depending on the size of the problem and the convergence criterion used.
Possibilities of parallelisation are investigated for both FM and LM based training. Parallel versions of these routines are proposed and shown to give significant speed-ups over the sequential versions for large problems.
KeywordsConjugate Gradient Line Search Limited Memory Error Surface Continuous Stir Tank Reactor
Unable to display preview. Download preview PDF.
- 1.G. Lightbody, “Identification and Control Using Neural Networks”, PhD thesis, Queen’s University of Belfast, Control Engineering Research Group, May 1993.Google Scholar
- 2.J.D. Morningred et al., “An Adaptive Nonlinear Predictive Controller”, Proc. ACC 90, Vol.2, pp. 1614–1619, May 1990.Google Scholar
- 3.D.E. Rumelhart, G. Hinton and R. Williams, “Learning internal representations by error propagation”, in D.E Rumelhart, J.L. McClelland, (editors), Parallel Distributed Processing, Vol.1 pp 318–364 MIT Press, 1986Google Scholar
- 5.P.E. Gill, W. Murray and M.H. Wrights, “Practical Optimization”, Academic Press, London.Google Scholar
- 6.R. Fletcher, “Practical Methods of Optimization”, Vol. 1, Wiley & Sons, pp.51.Google Scholar
- 7.S. McLoone, G.W. Irwin, “Insights into multilayer perceptrons and their training”, Proc. Irish DSP and Control Colloquium, 1994, pp.61–68.Google Scholar
- 8.G. Lightbody, G.W. Irwin, “A parallel Algorithm for Training Neural Network Based Nonlinear Models”, Proc. 2nd IFAC Workshop on Algorithms and Architectures for Realtime Control, 1992, pp. 99–104.Google Scholar
- 9.A. Beguelin, J.J. Dongarra, G.A. Geist, W. Jiang, R. Manchek, K. Moore and V.S. Sunderam, “PVM 3 User’s Guide and Reference Manual”, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, 1993.Google Scholar
- 10.G. Lightbody, G.W. Irwin, A. Taylor, K. Kelly and J. McCormick, “Neural Network Modelling of a Polymerisation Reactor”, Proc. IEE Int. Conf., Control ‘94, Vol.1, pp. 237–242.Google Scholar
- 11.M.D. Brown, G.W. Irwin, B.W. Hogg and E. Swidenbank, “Modelling and Control of Generating Units using Neural Network Techniques”, 3rd IEEE Control Applications Conference, Glasgow, August 1994, Vol.1, pp. 735–740.Google Scholar