Learning by conjugate gradients
A learning algorithm (CG) with superlinear convergence rate is introduced. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. CG uses second order information from the neural network but requires only O(N) memory usage, where N is the number of minimization variables; in our case all the weights in the network. The performance of CG is benchmarked against the performance of the ordinary backpropagation algorithm (BP). We find that CG is considerably faster than BP and that CG is able to perform the learning task with fewer hidden units.
KeywordsConjugate Gradient Method Memory Usage Hide Unit Order Information Conjugate System
Unable to display preview. Download preview PDF.
- [Fletcher]Fletcher,R., Practical Methods of Optimization, Vol.1, Unconstrained Optimization, John Wiley & Sons, 1975.Google Scholar
- [Gill]Gill, P., Practical Optimization, Academic Press inc., 1980.Google Scholar
- [Hestenes]Hestenes, M., Conjugate Direction Methods in Optimization, Springer Verlag, New York, 1980.Google Scholar
- [Hinton]Hinton, G., Connectionist Learning Procedures, Artificial Intelligence (1989), pp. 185–234.Google Scholar
- [Madsen]Madsen, K., Optimering, hæfte 38, Numerisk Institut, DTH, 1984.Google Scholar
- [Powell]Powell, M., Restart procedures for the Conjugate Gradient Method, Mathematical Programming, Vol. 12, pp. 241–254.Google Scholar