Superlinear Learning Algorithm Design

Géczy, Peter; Usui, Shiro

doi:10.1007/978-3-540-39935-3_11

Peter Géczy³ &
Shiro Usui³

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 152))

295 Accesses

Abstract

Superlinear algorithms are highly regarded for the speed-complexity ratio. With superlinear convergence rates and linear computational complexity they are the primary choice for large scale tasks. However, varying performance on different tasks rises the question of relationship between an algorithm and a task it is applied to. To approach the issue we establish a classification framework for both algorithms and tasks. The proposed classification framework permits independent specification of functions and optimization techniques. Within this framework the task of training MLP neural networks is classified. The presented theoretical material allows design of superlinear first order algorithms tailored to particular task. We introduce two such techniques with a line search subproblem simplified to a single step calculation of the appropriate values of step length and/or momentum term. It remarkably simplifies the implementation and computational complexity of the line search subproblem and yet does not harm the stability of the methods. The algorithms are theoretically proved convergent. Performance of the algorithms is extensively evaluated on five data sets and compared to the relevant first order optimization techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

P. E. Gill, W. Murray, and M. H. Wright.Practical Optimization. Academic Press, London, 1982.
Google Scholar
Leon S. Lasdon.Optimization Theory for Large Systems. Dover, New York, 2002.
Google Scholar
J. Hiriart-Urruty and C. LemarechalConvex Analysis and Minimization Algorithms I and II. Springer-Verlag ( Second Corrected Printing ), Heidelberg, 1996.
Google Scholar
H. Frenk, K. Roos, T. Terlaky, and S. Zhang (Editors).High Performance Optimization.Kluwer Academic Publishers, Dordrecht, 1999.
Google Scholar
E. K. P. Chong and S. H. Zak. AnIntroduction to Optimization,2nd Edition. John Wiley and New York, 2001.
Google Scholar
C. T. Kelley.Iterative Methods for Optimization.SIAM, Philadelphia, 1999.
Google Scholar
Ronald E. Miller.Optimization: Foundations and Application. John Wiley & Sons, Essex, 1999.
Google Scholar
. Cornelius T. Leondes.Optimization Techniques.Academic Press, London, 1998.
Google Scholar
R. K. Sundaram.A First Course in Optimization Theory. Cambridge University Press, Cambridge, 1996.
Google Scholar
Donald A. Pierre.Optimization Theory with Applications. Dover, New York, 1987.
Google Scholar
A. A. Goldstein. On steepest descent.SIAM Journal of Control, 3: 147–151, 1965.
Google Scholar
P. Wolfe. Convergent conditions for ascent methods.SIAM Review, 11: 226–235, 1969.
Article MathSciNet MATH Google Scholar
M. J. D. Powell. A view of unconstrained optimization. In L. C. W. Dixon, editor,Optimization in Action, London, 1976. Academic Press.
Google Scholar
M. Al-Baali and R. Fletcher. An efficient line search for nonlinear least squares.Journal of Optimization Theory and Application, 48 (3): 359–377, 1986.
Article MathSciNet MATH Google Scholar
. R. A. Jacobs.Increasing rates of convergence through learning rate adaptation. Neural Networks, 1: 295–307, 1988.
Google Scholar
T. P. Vogl, J. K. Manglis, A. K. Rigler, T. W. Zink, and D. L. Alkon. Accelerating the convergence of the back-propagation method.Biological Cybernetics, 59: 257–263, 1988.
Google Scholar
. Ch. G. Pflug. Non-asymptotic confidence bounds for stochastic approximation algorithms.Mathematic, 110: 297–314, 1990.
Google Scholar
T. Tollenaere. SuperSAB: Fast adaptive back propagation with good scaling properties.Neural Networks, 3: 561–573, 1990.
Article Google Scholar
J. C. Spa11. Introduction to Stochastic Search and Optimization.John Wiley &Sons, Essex, 2003.
Google Scholar
H. J. Kushner and G. G. Jin. Stochastic Approximation Algorithms and Applications.Springer-Verlag, New York, 1997.
Google Scholar
S. Amari. Theory of adaptive pattern classifiers.IEEE Transactions, EC-16(3): 299–307, 1967.
Google Scholar
L. Ljung. Analysis of recursive stochastic algorithms.IEEE Transactions on Control,AC-22(3):551–575, 1997.
Google Scholar
L. Ljung. Strong convergence of stochastic approximation algorithm.Annals of Statistics, 6 (3): 680–696, 1978.
Article MathSciNet MATH Google Scholar
C. Darken and J. Moody. Note on learning rate schedules for stochastic optimization. In R. P. Lippman, J. E. Moody, and D. S. Touretzky, editors,Proceedings of the Neural Information Processing Systems 3 (Denver), pp. 832–838, San Mateo, 1991. Morgan Kaufmann.
Google Scholar
C. Darken and J. Moody. Towards faster stochastic gradient search. In J. E. Moody, S. J. Hason, and R. P. Lipmann, editors,Proceedings of the Neural Information Processing Systems 4(Denver), pp. 1009–1016, San Mateo, 1992. Morgan Kaufmann.
Google Scholar
K. Ochiai, N. Toda, and S. Usui. Kick-Out learning algorithm to reduce the oscillation of weights.Neural Networks, 7 (5): 797–807, 1994.
Article Google Scholar
R. Fletcher and M. J. D. Powell. A rapidly convergent descent method for minimization.Comput. Journal, 6: 163–168, 1963.
MathSciNet MATH Google Scholar
R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients.Comput. Journal, 7: 149–154, 1964.
MathSciNet MATH Google Scholar
J. W. Daniel. Convergence of the conjugate gradient method with computationally convenient modifications.Numerical Mathematics, 10: 125–131, 1967.
Article MathSciNet MATH Google Scholar
B. T. Polyak. The conjugate-gradient method. InProceedings of The Second Winter School on Mathematical Programming and Related Questions, volume I, pp. 152–202, Moscow, 1969.
Google Scholar
D. F. Shanno. Conjugate gradient methods with inexact searches.Mathematics of Operations Research, 3 (3): 244–256, 1978.
Article MathSciNet MATH Google Scholar
S. E. Fahlman. Fast learning variations on back-propagation: An empirical study. In D. Touretzky, G. Hinton, and T. Sejnowski, editors,Proceedings of The 1988 Connectionist Models Summer School (Pittsburgh), pp. 38–51, San Mateo, 1989. Morgan Kaufmann.
Google Scholar
S. J. Perantonis and D. A. Karras. An efficient constrained learning algorithm with momentum acceleration.Neural Networks, 8 (2): 237–249, 1995.
Article Google Scholar
X. Yu, N. K. Loh, and W. C. Miller. A new acceleration technique for the back-propagation algorithm. InProceedings of The IEEE International Conference on Neural Networks, pp. 1157–1161, San Francisco, 1993.
Google Scholar
X. Yu, G. Chen, and S. Cheng. Dynamic learning rate optimization of the backpropagation algorithm.IEEE Transactions on Neural Networks,6(3):669677, 1995.
Google Scholar
P. Géczy and S. Usui. Novel first order optimization classification framework.IEICE Transactions on Fundamentals, E83-A(11): 2312–2319, 2000.
Google Scholar
K. Hornik. Multilayer feedforward networks are universal approximators.Neural Networks, 2: 359–366, 1989.
Article Google Scholar
A. Menon, K. Mehrotra, C.K. Mohan, and S. Ranka. Characterization of a class of sigmoid functions with application to neural networks.Neural Networks, 6 (5): 819–835, 1996.
Article Google Scholar
P. Géczy and S. Usui. Novel concept for first order learning aglorithm design. InProceedings of IJCNN 2001, pp. 382–387, Washington D.C., 2001.
Google Scholar
P. Géczy, S. Amari, and S. Usui. Superconvergence concept in machine learning. In P. Sintâk, J. Vastâk, V. Kvasnitka, and J. Pospichal, editors,Intelligent Technologies–Theory and Applications, pp. 3–9, IOS Press, Amsterdam, 2002.
Google Scholar
J. Cendrowska. Prism: An algorithm for inducing modular rules.International Journal of Man-Machine Studies, 27: 349–370, 1987.
Article MATH Google Scholar
J. Wnek and R. S. Michalski. Comparing symbolic and subsymbolic learning: Three studies. In R. S. Michalski and G. Tecuci, editors,Machine Learning: A Multistrategy Approach, volume 4, San Mateo, 1993. Morgan Kaufmann
Google Scholar

Download references

Author information

Authors and Affiliations

RIKEN Brain Science Institute, 2-1 Hirosawa, Wako-shi, Saitama, 351-0198, Japan
Peter Géczy & Shiro Usui

Authors

Peter Géczy
View author publications
You can also search for this author in PubMed Google Scholar
Shiro Usui
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, 639798, Singapore, Singapore
Jagath Chandana Rajapakse & Lipo Wang &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Géczy, P., Usui, S. (2004). Superlinear Learning Algorithm Design. In: Rajapakse, J.C., Wang, L. (eds) Neural Information Processing: Research and Development. Studies in Fuzziness and Soft Computing, vol 152. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39935-3_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-39935-3_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53564-2
Online ISBN: 978-3-540-39935-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics