An Optimized Second Order Stochastic Learning Algorithm for Neural Network Training

  • Mohamed Khalil-Hani
  • Shan Sung LiewEmail author
  • Rabia Bakhteri
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9489)


The performance of a neural network depends critically on its model structure and the corresponding learning algorithm. This paper proposes bounded stochastic diagonal Levenberg-Marquardt (B-SDLM), an improved second order stochastic learning algorithm for supervised neural network training. The algorithm consists of a single hyperparameter only and requires negligible additional computations compared to conventional stochastic gradient descent (SGD) method while ensuring better learning stability. The experiments have shown very fast convergence and better generalization ability achieved by our proposed algorithm, outperforming several other learning algorithms.


Second order method Fast convergence Stochastic diagonal Levenberg-Marquardt Convolutional neural network 



This work is supported by Universiti Teknologi Malaysia (UTM) and the Ministry of Science, Technology and Innovation of Malaysia (MOSTI) under the ScienceFund Grant No. 4S116.


  1. 1.
    Becker, S., Le Cun, Y.: Improving the convergence of back-propagation learning with second order methods. In: Proceedings of the connectionist models summer school, pp. 29–37 (1988)Google Scholar
  2. 2.
    Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5), 1190–1208 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Chen, X.-W., Member, S., Lin, X.: Big data deep learning : challenges and perspectives. IEEE Access 2 (2014)Google Scholar
  4. 4.
    Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 9, pp. 249–256 (2010)Google Scholar
  6. 6.
    Igel, C., Hüsken, M.: Improving the Rprop learning algorithm. In: Proceedings of the Second International Symposium on Neural Computation (Nc), pp. 115–121 (2000)Google Scholar
  7. 7.
    LeCun, Y., Bottou, L.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  8. 8.
    LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient BackProp. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 9–48. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  9. 9.
    Liew, S., Khalil-Hani, M., Syafeeza, A., Bakhteri, R.: Gender classification: a convolutional neural network approach. Turk. J. Elec. Engin.
  10. 10.
    Milakov, M.: Convolutional Neural Networks in Galaxy Zoo Challenge, pp. 1–7 (2014)Google Scholar
  11. 11.
    Shanno, D.F.: Conditioning of Quasi-Newton methods for function minimization. Math. Comput. 24(111), 647–656 (1970)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Syafeeza, A., Khalil-Hani, M., Liew, S., Bakhteri, R.: Convolutional neural network for face recognition with pose and illumination variation. Int. J. Eng. Technol. 6(1), 44–57 (2014). Google Scholar
  13. 13.
    Zeiler, M.D.: ADADELTA: an adaptive learning rate method. CoRR abs/1212.5701 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Mohamed Khalil-Hani
    • 1
  • Shan Sung Liew
    • 1
    Email author
  • Rabia Bakhteri
    • 1
  1. 1.VeCAD Research Laboratory, Faculty of Electrical EngineeringUniversiti Teknologi MalaysiaSkudaiMalaysia

Personalised recommendations