Skip to main content

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 152))

  • 295 Accesses

Abstract

Superlinear algorithms are highly regarded for the speed-complexity ratio. With superlinear convergence rates and linear computational complexity they are the primary choice for large scale tasks. However, varying performance on different tasks rises the question of relationship between an algorithm and a task it is applied to. To approach the issue we establish a classification framework for both algorithms and tasks. The proposed classification framework permits independent specification of functions and optimization techniques. Within this framework the task of training MLP neural networks is classified. The presented theoretical material allows design of superlinear first order algorithms tailored to particular task. We introduce two such techniques with a line search subproblem simplified to a single step calculation of the appropriate values of step length and/or momentum term. It remarkably simplifies the implementation and computational complexity of the line search subproblem and yet does not harm the stability of the methods. The algorithms are theoretically proved convergent. Performance of the algorithms is extensively evaluated on five data sets and compared to the relevant first order optimization techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. E. Gill, W. Murray, and M. H. Wright.Practical Optimization. Academic Press, London, 1982.

    Google Scholar 

  2. Leon S. Lasdon.Optimization Theory for Large Systems. Dover, New York, 2002.

    Google Scholar 

  3. J. Hiriart-Urruty and C. LemarechalConvex Analysis and Minimization Algorithms I and II. Springer-Verlag ( Second Corrected Printing ), Heidelberg, 1996.

    Google Scholar 

  4. H. Frenk, K. Roos, T. Terlaky, and S. Zhang (Editors).High Performance Optimization.Kluwer Academic Publishers, Dordrecht, 1999.

    Google Scholar 

  5. E. K. P. Chong and S. H. Zak. AnIntroduction to Optimization,2nd Edition. John Wiley and New York, 2001.

    Google Scholar 

  6. C. T. Kelley.Iterative Methods for Optimization.SIAM, Philadelphia, 1999.

    Google Scholar 

  7. Ronald E. Miller.Optimization: Foundations and Application. John Wiley & Sons, Essex, 1999.

    Google Scholar 

  8. . Cornelius T. Leondes.Optimization Techniques.Academic Press, London, 1998.

    Google Scholar 

  9. R. K. Sundaram.A First Course in Optimization Theory. Cambridge University Press, Cambridge, 1996.

    Google Scholar 

  10. Donald A. Pierre.Optimization Theory with Applications. Dover, New York, 1987.

    Google Scholar 

  11. A. A. Goldstein. On steepest descent.SIAM Journal of Control, 3: 147–151, 1965.

    Google Scholar 

  12. P. Wolfe. Convergent conditions for ascent methods.SIAM Review, 11: 226–235, 1969.

    Article  MathSciNet  MATH  Google Scholar 

  13. M. J. D. Powell. A view of unconstrained optimization. In L. C. W. Dixon, editor,Optimization in Action, London, 1976. Academic Press.

    Google Scholar 

  14. M. Al-Baali and R. Fletcher. An efficient line search for nonlinear least squares.Journal of Optimization Theory and Application, 48 (3): 359–377, 1986.

    Article  MathSciNet  MATH  Google Scholar 

  15. . R. A. Jacobs.Increasing rates of convergence through learning rate adaptation. Neural Networks, 1: 295–307, 1988.

    Google Scholar 

  16. T. P. Vogl, J. K. Manglis, A. K. Rigler, T. W. Zink, and D. L. Alkon. Accelerating the convergence of the back-propagation method.Biological Cybernetics, 59: 257–263, 1988.

    Google Scholar 

  17. . Ch. G. Pflug. Non-asymptotic confidence bounds for stochastic approximation algorithms.Mathematic, 110: 297–314, 1990.

    Google Scholar 

  18. T. Tollenaere. SuperSAB: Fast adaptive back propagation with good scaling properties.Neural Networks, 3: 561–573, 1990.

    Article  Google Scholar 

  19. J. C. Spa11. Introduction to Stochastic Search and Optimization.John Wiley &Sons, Essex, 2003.

    Google Scholar 

  20. H. J. Kushner and G. G. Jin. Stochastic Approximation Algorithms and Applications.Springer-Verlag, New York, 1997.

    Google Scholar 

  21. S. Amari. Theory of adaptive pattern classifiers.IEEE Transactions, EC-16(3): 299–307, 1967.

    Google Scholar 

  22. L. Ljung. Analysis of recursive stochastic algorithms.IEEE Transactions on Control,AC-22(3):551–575, 1997.

    Google Scholar 

  23. L. Ljung. Strong convergence of stochastic approximation algorithm.Annals of Statistics, 6 (3): 680–696, 1978.

    Article  MathSciNet  MATH  Google Scholar 

  24. C. Darken and J. Moody. Note on learning rate schedules for stochastic optimization. In R. P. Lippman, J. E. Moody, and D. S. Touretzky, editors,Proceedings of the Neural Information Processing Systems 3 (Denver), pp. 832–838, San Mateo, 1991. Morgan Kaufmann.

    Google Scholar 

  25. C. Darken and J. Moody. Towards faster stochastic gradient search. In J. E. Moody, S. J. Hason, and R. P. Lipmann, editors,Proceedings of the Neural Information Processing Systems 4(Denver), pp. 1009–1016, San Mateo, 1992. Morgan Kaufmann.

    Google Scholar 

  26. K. Ochiai, N. Toda, and S. Usui. Kick-Out learning algorithm to reduce the oscillation of weights.Neural Networks, 7 (5): 797–807, 1994.

    Article  Google Scholar 

  27. R. Fletcher and M. J. D. Powell. A rapidly convergent descent method for minimization.Comput. Journal, 6: 163–168, 1963.

    MathSciNet  MATH  Google Scholar 

  28. R. Fletcher and C. M. Reeves. Function minimization by conjugate gradients.Comput. Journal, 7: 149–154, 1964.

    MathSciNet  MATH  Google Scholar 

  29. J. W. Daniel. Convergence of the conjugate gradient method with computationally convenient modifications.Numerical Mathematics, 10: 125–131, 1967.

    Article  MathSciNet  MATH  Google Scholar 

  30. B. T. Polyak. The conjugate-gradient method. InProceedings of The Second Winter School on Mathematical Programming and Related Questions, volume I, pp. 152–202, Moscow, 1969.

    Google Scholar 

  31. D. F. Shanno. Conjugate gradient methods with inexact searches.Mathematics of Operations Research, 3 (3): 244–256, 1978.

    Article  MathSciNet  MATH  Google Scholar 

  32. S. E. Fahlman. Fast learning variations on back-propagation: An empirical study. In D. Touretzky, G. Hinton, and T. Sejnowski, editors,Proceedings of The 1988 Connectionist Models Summer School (Pittsburgh), pp. 38–51, San Mateo, 1989. Morgan Kaufmann.

    Google Scholar 

  33. S. J. Perantonis and D. A. Karras. An efficient constrained learning algorithm with momentum acceleration.Neural Networks, 8 (2): 237–249, 1995.

    Article  Google Scholar 

  34. X. Yu, N. K. Loh, and W. C. Miller. A new acceleration technique for the back-propagation algorithm. InProceedings of The IEEE International Conference on Neural Networks, pp. 1157–1161, San Francisco, 1993.

    Google Scholar 

  35. X. Yu, G. Chen, and S. Cheng. Dynamic learning rate optimization of the backpropagation algorithm.IEEE Transactions on Neural Networks,6(3):669677, 1995.

    Google Scholar 

  36. P. Géczy and S. Usui. Novel first order optimization classification framework.IEICE Transactions on Fundamentals, E83-A(11): 2312–2319, 2000.

    Google Scholar 

  37. K. Hornik. Multilayer feedforward networks are universal approximators.Neural Networks, 2: 359–366, 1989.

    Article  Google Scholar 

  38. A. Menon, K. Mehrotra, C.K. Mohan, and S. Ranka. Characterization of a class of sigmoid functions with application to neural networks.Neural Networks, 6 (5): 819–835, 1996.

    Article  Google Scholar 

  39. P. Géczy and S. Usui. Novel concept for first order learning aglorithm design. InProceedings of IJCNN 2001, pp. 382–387, Washington D.C., 2001.

    Google Scholar 

  40. P. Géczy, S. Amari, and S. Usui. Superconvergence concept in machine learning. In P. Sintâk, J. Vastâk, V. Kvasnitka, and J. Pospichal, editors,Intelligent Technologies–Theory and Applications, pp. 3–9, IOS Press, Amsterdam, 2002.

    Google Scholar 

  41. J. Cendrowska. Prism: An algorithm for inducing modular rules.International Journal of Man-Machine Studies, 27: 349–370, 1987.

    Article  MATH  Google Scholar 

  42. J. Wnek and R. S. Michalski. Comparing symbolic and subsymbolic learning: Three studies. In R. S. Michalski and G. Tecuci, editors,Machine Learning: A Multistrategy Approach, volume 4, San Mateo, 1993. Morgan Kaufmann

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Géczy, P., Usui, S. (2004). Superlinear Learning Algorithm Design. In: Rajapakse, J.C., Wang, L. (eds) Neural Information Processing: Research and Development. Studies in Fuzziness and Soft Computing, vol 152. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39935-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39935-3_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-53564-2

  • Online ISBN: 978-3-540-39935-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics