Abstract
Superlinear algorithms are highly regarded for the speed-complexity ratio. With superlinear convergence rates and linear computational complexity they are the primary choice for large scale tasks. However, varying performance on different tasks rises the question of relationship between an algorithm and a task it is applied to. To approach the issue we establish a classification framework for both algorithms and tasks. The proposed classification framework permits independent specification of functions and optimization techniques. Within this framework the task of training MLP neural networks is classified. The presented theoretical material allows design of superlinear first order algorithms tailored to particular task. We introduce two such techniques with a line search subproblem simplified to a single step calculation of the appropriate values of step length and/or momentum term. It remarkably simplifies the implementation and computational complexity of the line search subproblem and yet does not harm the stability of the methods. The algorithms are theoretically proved convergent. Performance of the algorithms is extensively evaluated on five data sets and compared to the relevant first order optimization techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
P. E. Gill, W. Murray, and M. H. Wright.Practical Optimization. Academic Press, London, 1982.
Leon S. Lasdon.Optimization Theory for Large Systems. Dover, New York, 2002.
J. Hiriart-Urruty and C. LemarechalConvex Analysis and Minimization Algorithms I and II. Springer-Verlag ( Second Corrected Printing ), Heidelberg, 1996.
H. Frenk, K. Roos, T. Terlaky, and S. Zhang (Editors).High Performance Optimization.Kluwer Academic Publishers, Dordrecht, 1999.
E. K. P. Chong and S. H. Zak. AnIntroduction to Optimization,2nd Edition. John Wiley and New York, 2001.
C. T. Kelley.Iterative Methods for Optimization.SIAM, Philadelphia, 1999.
Ronald E. Miller.Optimization: Foundations and Application. John Wiley & Sons, Essex, 1999.
. Cornelius T. Leondes.Optimization Techniques.Academic Press, London, 1998.
R. K. Sundaram.A First Course in Optimization Theory. Cambridge University Press, Cambridge, 1996.
Donald A. Pierre.Optimization Theory with Applications. Dover, New York, 1987.
A. A. Goldstein. On steepest descent.SIAM Journal of Control, 3: 147–151, 1965.
P. Wolfe. Convergent conditions for ascent methods.SIAM Review, 11: 226–235, 1969.
M. J. D. Powell. A view of unconstrained optimization. In L. C. W. Dixon, editor,Optimization in Action, London, 1976. Academic Press.
M. Al-Baali and R. Fletcher. An efficient line search for nonlinear least squares.Journal of Optimization Theory and Application, 48 (3): 359–377, 1986.
. R. A. Jacobs.Increasing rates of convergence through learning rate adaptation. Neural Networks, 1: 295–307, 1988.
T. P. Vogl, J. K. Manglis, A. K. Rigler, T. W. Zink, and D. L. Alkon. Accelerating the convergence of the back-propagation method.Biological Cybernetics, 59: 257–263, 1988.
. Ch. G. Pflug. Non-asymptotic confidence bounds for stochastic approximation algorithms.Mathematic, 110: 297–314, 1990.
T. Tollenaere. SuperSAB: Fast adaptive back propagation with good scaling properties.Neural Networks, 3: 561–573, 1990.
J. C. Spa11. Introduction to Stochastic Search and Optimization.John Wiley &Sons, Essex, 2003.
H. J. Kushner and G. G. Jin. Stochastic Approximation Algorithms and Applications.Springer-Verlag, New York, 1997.
S. Amari. Theory of adaptive pattern classifiers.IEEE Transactions, EC-16(3): 299–307, 1967.
L. Ljung. Analysis of recursive stochastic algorithms.IEEE Transactions on Control,AC-22(3):551–575, 1997.
L. Ljung. Strong convergence of stochastic approximation algorithm.Annals of Statistics, 6 (3): 680–696, 1978.
C. Darken and J. Moody. Note on learning rate schedules for stochastic optimization. In R. P. Lippman, J. E. Moody, and D. S. Touretzky, editors,Proceedings of the Neural Information Processing Systems 3 (Denver), pp. 832–838, San Mateo, 1991. Morgan Kaufmann.
C. Darken and J. Moody. Towards faster stochastic gradient search. In J. E. Moody, S. J. Hason, and R. P. Lipmann, editors,Proceedings of the Neural Information Processing Systems 4(Denver), pp. 1009–1016, San Mateo, 1992. Morgan Kaufmann.
K. Ochiai, N. Toda, and S. Usui. Kick-Out learning algorithm to reduce the oscillation of weights.Neural Networks, 7 (5): 797–807, 1994.
R. Fletcher and M. J. D. Powell. A rapidly convergent descent method for minimization.Comput. Journal, 6: 163–168, 1963.
R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients.Comput. Journal, 7: 149–154, 1964.
J. W. Daniel. Convergence of the conjugate gradient method with computationally convenient modifications.Numerical Mathematics, 10: 125–131, 1967.
B. T. Polyak. The conjugate-gradient method. InProceedings of The Second Winter School on Mathematical Programming and Related Questions, volume I, pp. 152–202, Moscow, 1969.
D. F. Shanno. Conjugate gradient methods with inexact searches.Mathematics of Operations Research, 3 (3): 244–256, 1978.
S. E. Fahlman. Fast learning variations on back-propagation: An empirical study. In D. Touretzky, G. Hinton, and T. Sejnowski, editors,Proceedings of The 1988 Connectionist Models Summer School (Pittsburgh), pp. 38–51, San Mateo, 1989. Morgan Kaufmann.
S. J. Perantonis and D. A. Karras. An efficient constrained learning algorithm with momentum acceleration.Neural Networks, 8 (2): 237–249, 1995.
X. Yu, N. K. Loh, and W. C. Miller. A new acceleration technique for the back-propagation algorithm. InProceedings of The IEEE International Conference on Neural Networks, pp. 1157–1161, San Francisco, 1993.
X. Yu, G. Chen, and S. Cheng. Dynamic learning rate optimization of the backpropagation algorithm.IEEE Transactions on Neural Networks,6(3):669677, 1995.
P. Géczy and S. Usui. Novel first order optimization classification framework.IEICE Transactions on Fundamentals, E83-A(11): 2312–2319, 2000.
K. Hornik. Multilayer feedforward networks are universal approximators.Neural Networks, 2: 359–366, 1989.
A. Menon, K. Mehrotra, C.K. Mohan, and S. Ranka. Characterization of a class of sigmoid functions with application to neural networks.Neural Networks, 6 (5): 819–835, 1996.
P. Géczy and S. Usui. Novel concept for first order learning aglorithm design. InProceedings of IJCNN 2001, pp. 382–387, Washington D.C., 2001.
P. Géczy, S. Amari, and S. Usui. Superconvergence concept in machine learning. In P. Sintâk, J. Vastâk, V. Kvasnitka, and J. Pospichal, editors,Intelligent Technologies–Theory and Applications, pp. 3–9, IOS Press, Amsterdam, 2002.
J. Cendrowska. Prism: An algorithm for inducing modular rules.International Journal of Man-Machine Studies, 27: 349–370, 1987.
J. Wnek and R. S. Michalski. Comparing symbolic and subsymbolic learning: Three studies. In R. S. Michalski and G. Tecuci, editors,Machine Learning: A Multistrategy Approach, volume 4, San Mateo, 1993. Morgan Kaufmann
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Géczy, P., Usui, S. (2004). Superlinear Learning Algorithm Design. In: Rajapakse, J.C., Wang, L. (eds) Neural Information Processing: Research and Development. Studies in Fuzziness and Soft Computing, vol 152. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39935-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-39935-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53564-2
Online ISBN: 978-3-540-39935-3
eBook Packages: Springer Book Archive