Computational Mathematics and Modeling

, Volume 30, Issue 4, pp 427–438 | Cite as

Two-Point Step Size Gradient Method for Solving a Deep Learning Problem

  • T. D. TodorovEmail author
  • G. S. Tsanev

This paper is devoted to an analysis of the rate of deep belief learning by multilayer neural networks. In designing neural networks, many authors have applied the mean field approximation (MFA) to establish that the state of neurons in hidden layers is active. To study the convergence of the MFAs, we transform the original problem to a minimization one. The object of investigation is the Barzilai–Borwein method for solving the obtained optimization problem. The essence of the two-point step size gradient method is its variable steplength. The appropriate steplength depends on the objective functional. Original steplengths are obtained and compared with the classical steplength. Sufficient conditions for existence and uniqueness of the weak solution are established. A rigorous proof of the convergence theorem is presented. Various tests with different kinds of weight matrices are discussed.


Deep Boltzmann machine mean field approximation gradient iterative methods 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    G. E. Hinton, A. Krizhevsky, N. Srivastava, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., 15, 1929–1958 (2014).MathSciNetzbMATHGoogle Scholar
  2. 2.
    H. K. Jabbar and R. Z. Khan, “Methods to avoid over-fitting and under-fitting in supervised machine learning (comparative study),” in: Computer Science, Communication & Instrumentation Devices, Editors: J. Stephen, H. Rohil, and S. Vasavi, (2015), pp. 163–172.Google Scholar
  3. 3.
    R. Salakhutdinov and G. E. Hinton, “Deep Boltzmann Machines,” Proc. Conf. Artif. Intel. Stat. (AISTATS 2009), 448–455 (2009).Google Scholar
  4. 4.
    L. Bottou, F. E. Curtis, and Jorge Nocedal, “Optimization methods for large-scale machine learning,” SIAM Rev., 60, 2, 223–311 (2018).MathSciNetCrossRefGoogle Scholar
  5. 5.
    R. Salakhutdinov, “Learning Deep Boltzmann Machines using adaptive MCMC,” Proc. 27th Int. Conf. Mach. Lear., Haifa, Israel, 943–950 (2010).Google Scholar
  6. 6.
    R. Salakhutdinov and H. Larochelle, “Efficient learning of Deep Boltzmann Machines,” J. Mach. Learn. Res., 9, 693–700 (2010).Google Scholar
  7. 7.
    G. Hinton and R. Salakhutdinov, “An efficient learning procedure for deep Boltzmann machines,” Neural Comput.,24, 8, 1967–2006 (2012).MathSciNetCrossRefGoogle Scholar
  8. 8.
    K. Cho, T. Raiko, A. Ilin, and J. Karhunen, “A two-stage pretraining algorithm for Deep Boltzmann Machines,” Artif. Neural Netw. Mach. Learn. (ICANN), 8131, 106-113 (2013).Google Scholar
  9. 9.
    K. Cho, T. Raiko, and A. Ilin, “Gaussian–Bernoulli Deep Boltzmann Machine,” IEEE Int. Joint Conf. Neural Netw., Dallas, Texas, USA, 1–7 (2013).Google Scholar
  10. 10.
    A. Dremeau, “Boltzmann machine and mean-field approximation for structured sparse decompositions,” IEEE Trans Signal Process., 60, 7, 3425–3438 (2012).MathSciNetCrossRefGoogle Scholar
  11. 11.
    N. Srivastava and R. Salakhutdinov, “Multimodal learning with Deep Boltzmann Machines,” J. Mach. Learn. Res., 15, 2949–2980 (2014).MathSciNetzbMATHGoogle Scholar
  12. 12.
    J. Barzilai and J. M. Borwein, “Two-point step size gradient methods,” IMA J. Numer. Anal., 8, 1, 141–148 (1988).MathSciNetCrossRefGoogle Scholar
  13. 13.
    E. G. Birgin, J. M. Martínez, and M. Raydan, “Spectral projected gradient methods: review and perspectives,” J. Stat. Softw., 60, 3, 1–21 (2014).CrossRefGoogle Scholar
  14. 14.
    M. Raydan, “On the Barzilai and Borwein choice of steplength for the gradient method,” IMA J. Numer. Anal., 13, 3, 321–326 (1993).MathSciNetCrossRefGoogle Scholar
  15. 15.
    T. D. Todorov, “Nonlocal problem for a general second-order elliptic operator,” Comput. Math. Appl., 69, 5, 411–422 (2015).MathSciNetCrossRefGoogle Scholar
  16. 16.
    D. Wei, “Finite element approximations of solutions to p-harmonic equation with Dirichlet data,” Numert. Func. Anal. Optim., 10(11&12), 1235–1251 (1989).MathSciNetCrossRefGoogle Scholar
  17. 17.
    T. D. Todorov, “Dirichlet problem for a nonlocal p-Laplacian elliptic equation,” Comput. Math. Appl.,76, 6, 1261–1274 (2018).MathSciNetCrossRefGoogle Scholar
  18. 18.
    A. Zhang, J. Zhu, and B. Zhang, “Max-margin infinite hidden Markov models,” Proc. 31st Int. Conf. Mach. Learn. (PMLR), 32, 1, 315–323 (2014).Google Scholar
  19. 19.
    G. S. Tsanev, “Deep multiconnected Boltzmann machine for classification,” Amer. J. Eng. Res., 6, 5, 186–194 (2017).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Mathematics and InformaticsTechnical UniversityGabrovoBulgaria
  2. 2.Department of Computer Systems and TechnologyTechnical UniversityGabrovoBulgaria

Personalised recommendations