Skip to main content
Log in

Global optimization issues in deep network regression: an overview

  • Published:
Journal of Global Optimization Aims and scope Submit manuscript

Abstract

The paper presents an overview of global issues in optimization methods for training feedforward neural networks (FNN) in a regression setting. We first recall the learning optimization paradigm for FNN and we briefly discuss global scheme for the joint choice of the network topologies and of the network parameters. The main part of the paper focuses on the core subproblem which is the continuous unconstrained (regularized) weights optimization problem with the aim of reviewing global methods specifically arising both in multi layer perceptron/deep networks and in radial basis networks. We review some recent results on the existence of non-global stationary points of the unconstrained nonlinear problem and the role of determining a global solution in a supervised learning paradigm. Local algorithms that are widespread used to solve the continuous unconstrained problems are addressed with focus on possible improvements to exploit the global properties. Hybrid global methods specifically devised for FNN training optimization problems which embed local algorithms are discussed too.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Usually the data set is divided into K subsets and the average validation error across all K trials is computed (K-fold cross validation).

  2. The picture has been obtained using a shallow network with \(N=100\) and fixing all the weights to a given random value but the two in input-to-hidden layer

References

  1. Abraham, A.: Meta learning evolutionary artificial neural networks. Neurocomputing 56, 1–38 (2004)

    Google Scholar 

  2. Adam, S., Magoulas, G., Karras, D., Vrahatis, M.: Bounding the search space for global optimization of neural networks learning error: an interval analysis approach. J. Mach. Learn. Res. 17, 1–40 (2016)

    MathSciNet  MATH  Google Scholar 

  3. Adamu, A., Maul, T., Bargiela, A.: On training neural networks with transfer function diversity. In: International Conference on Computational Intelligence and Information Technology (CIIT 2013), Elsevier (2013)

  4. Amato, S., Apolloni, B., Caporali, G., Madesani, U., Zanaboni, A.: Simulated annealing approach in backpropagation. Neurocomputing 3(5), 207–220 (1991)

    Google Scholar 

  5. An, G.: The effects of adding noise during backpropagation training on a generalization performance. Neural Comput. 8(3), 643–674 (1996)

    Google Scholar 

  6. Bagirov, A., Rubinov, A., Soukhoroukova, N., Yearwood, J.: Unsupervised and supervised data classification via nonsmooth and global optimization. Top 11(1), 1–75 (2003)

    MathSciNet  MATH  Google Scholar 

  7. Baldi, P., Hornik, K.: Neural networks and principal component analysis: learning from examples without local minima. Neural Netw. 2(1), 53–58 (1989)

    Google Scholar 

  8. Baldi, P., Lu, Z.: Complex-valued autoencoders. Neural Netw. 33, 136–147 (2012)

    MATH  Google Scholar 

  9. Baldi, P., Sadowski, P.: The dropout learning algorithm. Artif. Intell. 210, 78–122 (2014)

    MathSciNet  MATH  Google Scholar 

  10. Barhen, J., Protopopescu, V., Reister, D.: TRUST: a deterministic algorithm for global optimization. Science 276(5315), 1094–1097 (1997)

    MathSciNet  MATH  Google Scholar 

  11. Bates, D.M., Watts, D.G.: Nonlinear Regression Analysis and Its Applications. Wiley Series in Probability and Statistics. Wiley, Hoboken (2007)

    Google Scholar 

  12. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp. 41–48. ACM (2009)

  13. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  14. Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  15. Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. Optim. Mach. Learn. 2010(1–38), 3 (2011)

    Google Scholar 

  16. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs (1989)

    MATH  Google Scholar 

  17. Bertsekas, D.P., Tsitsiklis, J.N.: Gradient convergence in gradient methods with errors. SIAM J. Optim. 10(3), 627–642 (2000)

    MathSciNet  MATH  Google Scholar 

  18. Bertsimas, D., Dunn, J.: Optimal classification trees. Mach. Learn. 106(7), 1039–1082 (2017). https://doi.org/10.1007/s10994-017-5633-9

    MathSciNet  MATH  Google Scholar 

  19. Bertsimas, D., Shioda, R.: Classification and regression via integer optimization. Oper. Res. 55(2), 252–271 (2007)

    MathSciNet  MATH  Google Scholar 

  20. Bianchini, M., Frasconi, P., Gori, M.: Learning without local minima in radial basis function networks. IEEE Trans. Neural Netw. 6(3), 749–756 (1995)

    Google Scholar 

  21. Bishop, C.: Improving the generalization properties of radial basis function neural networks. Neural Comput. 3(4), 579–588 (1991)

    Google Scholar 

  22. Bishop, C.: Pattern Recognition and Machine Learning (Information Science and Statistics), 1st edn. 2006. corr. 2nd printing edn (2007)

  23. Blum, A., Rivest, R.L.: Training a 3-node neural network is NP-complete. In: Proceedings of the 1st International Conference on Neural Information Processing Systems, pp. 494–501. MIT Press (1988)

  24. Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks (2015). arXiv preprint arXiv:1505.05424

  25. Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS’07, pp. 161–168. Curran Associates Inc., USA (2007). http://dl.acm.org/citation.cfm?id=2981562.2981583

  26. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)

    MathSciNet  MATH  Google Scholar 

  27. Boubezoul, A., Paris, S.: Application of global optimization methods to model and feature selection. Pattern Recognit. 45(10), 3676–3686 (2012)

    MATH  Google Scholar 

  28. Branke, J.: Evolutionary algorithms for neural network design and training. In: Proceedings of the First Nordic Workshop on Genetic Algorithms and its Applications, pp. 145–163 (1995)

  29. Bravi, L., Piccialli, V., Sciandrone, M.: An optimization-based method for feature ranking in nonlinear regression problems. IEEE Trans. Neural Netw. Learn. Syst. 28(4), 1005–1010 (2017)

    Google Scholar 

  30. Bray, A.J., Dean, D.S.: Statistics of critical points of Gaussian fields on large-dimensional spaces. Phys. Rev. Lett. 98(15), 150 201 (2007)

    Google Scholar 

  31. Breuel, T.M.: On the convergence of SGD training of neural networks (2015). arXiv preprint arXiv:1508.02790

  32. Buchtala, O., Klimek, M., Sick, B.: Evolutionary optimization of radial basis function classifiers for data mining applications. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 35(5), 928–947 (2005)

    Google Scholar 

  33. Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)

    Google Scholar 

  34. Buzzi, C., Grippo, L., Sciandrone, M.: Convergent decomposition techniques for training RBF neural networks. Neural Comput. 13(8), 1891–1920 (2001)

    MATH  Google Scholar 

  35. Carrizosa, E., Martín-Barragán, B., Morales, D.R.: A nested heuristic for parameter tuning in support vector machines. Comput. Oper. Res. 43, 328–334 (2014)

    MathSciNet  MATH  Google Scholar 

  36. Carrizosa, E., Morales, D.R.: Supervised classification and mathematical optimization. Comput. Oper. Res. 40(1), 150–165 (2013)

    MathSciNet  MATH  Google Scholar 

  37. Cetin, B., Barhen, J., Burdick, J.: Terminal repeller unconstrained subenergy tunneling ( trust) for fast global optimization. J. Optim. Theory Appl. 77(1), 97–126 (1993)

    MathSciNet  MATH  Google Scholar 

  38. Cetin, B.C., Burdick, J.W., Barhen, J.: Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks. In: IEEE International Conference onNeural Networks, 1993, pp. 836–842. IEEE (1993)

  39. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)

    Google Scholar 

  40. Chao, J., Hoshino, M., Kitamura, T., Masuda, T.: A multilayer RBF network and its supervised learning. In: International Joint Conference on Neural Networks, 2001 (IJCNN’01), Proceedings, vol. 3, pp. 1995–2000. IEEE (2001)

  41. Chapelle, O., Sindhwani, V., Keerthi, S.S.: Optimization techniques for semi-supervised support vector machines. J. Mach. Learn. Res. 9, 203–233 (2008)

    MATH  Google Scholar 

  42. Chen, S., Wu, Y., Luk, B.: Combined genetic algorithm optimization and regularized orthogonal least squares learning for radial basis function networks. IEEE Trans. Neural Netw. 10(5), 1239–1243 (1999)

    Google Scholar 

  43. Chiang, H.D., Reddy, C.K.: TRUST-TECH based neural network training. In: International Joint Conference on Neural Networks, 2007. (IJCNN 2007), pp. 90–95. IEEE (2007)

  44. Cho, Sy, Chow, T.W.: Training multilayer neural networks using fast global learning algorithm—least-squares and penalized optimization methods. Neurocomputing 25(1), 115–131 (1999)

    MATH  Google Scholar 

  45. Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., LeCun, Y.: The loss surfaces of multilayer networks. In: AISTATS (2015)

  46. Choromanska, A., LeCun, Y., Arous, G.B.: Open problem: the landscape of the loss surfaces of multilayer networks. In: COLT, pp. 1756–1760 (2015)

  47. Cohen, S., Intrator, N.: Global optimization of RBF networks (2000). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.5955

  48. Cohen, S., Intrator, N.: A hybrid projection-based and radial basis function architecture: initial values and global optimisation. Pattern Anal. Appl. 5(2), 113–120 (2002)

    MathSciNet  MATH  Google Scholar 

  49. Dai, Q., Ma, Z., Xie, Q.: A two-phased and ensemble scheme integrated backpropagation algorithm. Appl. Soft Comput. 24, 1124–1135 (2014)

    Google Scholar 

  50. Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Advances in neural information processing systems, pp. 2933–2941 (2014)

  51. David, O.E., Greental, I.: Genetic algorithms for evolving deep neural networks. In: Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 1451–1452. ACM (2014)

  52. Dietterich, T.G.: Ensemble methods in machine learning. In: International workshop on multiple classifier systems, pp. 1–15. Springer (2000)

  53. Duarte Silva, A.P.: Optimization approaches to supervised classification. Eur. J. Oper. Res. 261(2), 772–788 (2017)

    MathSciNet  MATH  Google Scholar 

  54. Duch, W., Jankowski, N.: New neural transfer functions. Appl. Math. Comput. Sci. 7, 639–658 (1997)

    MathSciNet  MATH  Google Scholar 

  55. Duch, W., Jankowski, N.: Survey of neural transfer functions. Neural Comput. Surv. 2(1), 163–212 (1999)

    Google Scholar 

  56. Duch, W., Korczak, J.: Optimization and global minimization methods suitable for neural networks. Neural Comput. Surv. 2, 163–212 (1998)

    Google Scholar 

  57. Feng-wen, H., Ai-ping, J.: An improved method of wavelet neural network optimization based on filled function method. In: 16th International Conference on Industrial Engineering and Engineering Management, 2009 (IE&EM’09), pp. 1694–1697. IEEE (2009)

  58. Fischetti, M.: Fast training of support vector machines with gaussian kernel. Discrete Optim. 22, 183–194 (2016)

    MathSciNet  MATH  Google Scholar 

  59. Floudas, C.A.: Deterministic Global Optimization: Theory, Methods and Applications, vol. 37. Springer, Berlin (2013)

    Google Scholar 

  60. Fukumizu, K., Amari, Si: Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural Netw. 13(3), 317–327 (2000)

    Google Scholar 

  61. Ge, R.: A filled function method for finding a global minimizer of a function of several variables. Math. Program. 46(1–3), 191–204 (1990)

    MathSciNet  MATH  Google Scholar 

  62. González, J., Rojas, I., Ortega, J., Pomares, H., Fernandez, F.J., Díaz, A.F.: Multiobjective evolutionary optimization of the size, shape, and position parameters of radial basis function networks for function approximation. IEEE Trans. Neural Netw. 14(6), 1478–1495 (2003)

    Google Scholar 

  63. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  64. Goodfellow, I.J., Vinyals, O.: Qualitatively characterizing neural network optimization problems. CoRR (2014). http://arxiv.org/abs/1412.6544

  65. Gori, M., Tesi, A.: On the problem of local minima in backpropagation. IEEE Trans. Pattern Anal. Mach. Intell. 14(1), 76–86 (1992)

    Google Scholar 

  66. Gorse, D., Shepherd, A.J., Taylor, J.G.: Avoiding local minima by a classical range expansion algorithm. In: ICANN94, pp. 525–528. Springer, London (1994)

  67. Gorse, D., Shepherd, A.J., Taylor, J.G.: A classical algorithm for avoiding local minima. In: Proceedings of the World Congress on Neural Networks, pp. 364–369. Citeseer (1994)

  68. Gorse, D., Shepherd, A.J., Taylor, J.G.: The new ERA in supervised learning. Neural Netw. 10(2), 343–352 (1997)

    Google Scholar 

  69. Graves, A.: Practical variational inference for neural networks. In: Advances in Neural Information Processing Systems, pp. 2348–2356 (2011)

  70. Grippo, L.: Convergent on-line algorithms for supervised learning in neural networks. IEEE Trans. Neural Netw. 11(6), 1284–1299 (2000)

    Google Scholar 

  71. Grippo, L., Manno, A., Sciandrone, M.: Decomposition techniques for multilayer perceptron training. IEEE Trans. Neural Netw. Learn. Syst. 27(11), 2146–2159 (2016)

    MathSciNet  Google Scholar 

  72. Grippo, L., Sciandrone, M.: Globally convergent block-coordinate techniques for unconstrained optimization. Optim. Methods Softw. 10(4), 587–637 (1999)

    MathSciNet  MATH  Google Scholar 

  73. Grippo, L., Sciandrone, M.: Nonmonotone globalization techniques for the Barzilai–Borwein gradient method. Comput. Optim. Appl. 23(2), 143–169 (2002)

    MathSciNet  MATH  Google Scholar 

  74. Györfi, L., Kohler, M., Krzyzak, A., Walk, H.: A Distribution-free Theory of Nonparametric Regression. Springer, Berlin (2006)

    MATH  Google Scholar 

  75. Hamey, L.G.: XOR has no local minima: a case study in neural network error surface analysis. Neural Netw. 11(4), 669–681 (1998)

    Google Scholar 

  76. Hamm, L., Brorsen, B.W., Hagan, M.T.: Comparison of stochastic global optimization methods to estimate neural network weights. Neural Process. Lett. 26(3), 145–158 (2007)

    Google Scholar 

  77. Haykin, S.: Neural Networks and Learning Machines, vol. 3. Pearson, Upper Saddle River (2009)

    Google Scholar 

  78. Hochreiter, S., Schmidhuber, J.: Flat minima. Neural Comput. 9(1), 1–42 (1997)

    MATH  Google Scholar 

  79. Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches. Springer, Berlin (2013)

    MATH  Google Scholar 

  80. Huang, G., Huang, G.B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015)

    MATH  Google Scholar 

  81. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings, vol. 2, pp. 985–990. IEEE (2004)

  82. Hui, L.C.K., Lam, K.Y., Chea, C.W.: Global optimisation in neural network training. Neural Comput. Appl. 5(1), 58–64 (1997)

    Google Scholar 

  83. Jin, Y., Sendhoff, B.: Pareto-based multiobjective machine learning: an overview and case studies. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 38(3), 397–415 (2008)

    Google Scholar 

  84. Kawaguchi, K.: Deep learning without poor local minima. In: Advances In Neural Information Processing Systems, pp. 586–594 (2016)

  85. Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. In: ICLR 2017 (2016)

  86. Lang, K.: Learning to tell two spiral apart. In: Proceedings of the 1988 Connectionist Models Summer School, pp. 52–59 (1989)

  87. Laurent, T., von Brecht, J.: The multilinear structure of ReLU networks (2017). arXiv preprint arXiv:1712.10132

  88. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Google Scholar 

  89. LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.R.: Efficient backprop. In: Neural networks: Tricks of the trade, pp. 9–48. Springer (2012)

  90. Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Conference on Learning Theory, pp. 1246–1257 (2016)

  91. Lee, J.S., Park, C.H.: Global optimization of radial basis function networks by hybrid simulated annealing. Neural Netw. World 20(4), 519 (2010)

    Google Scholar 

  92. Li, H.R., Li, H.L.: A global optimization algorithm based on filled-function for neural networks. J. Northeast. Univ. Nat. Sci. 28(9), 1247 (2007)

    MathSciNet  MATH  Google Scholar 

  93. Lin, S.W., Tseng, T.Y., Chou, S.Y., Chen, S.C.: A simulated-annealing-based approach for simultaneous parameter optimization and feature selection of back-propagation networks. Expert Syst. Appl. 34(2), 1491–1499 (2008)

    Google Scholar 

  94. Lisboa, P., Perantonis, S.: Complete solution of the local minima in the XOR problem. Network: Comput. Neural Syst. 2(1), 119–124 (1991)

    MathSciNet  MATH  Google Scholar 

  95. Liu, H., Wang, Y., Guan, S., Liu, X.: A new filled function method for unconstrained global optimization. Int. J. Comput. Math. 94(12), 2283–2296 (2017)

    MathSciNet  MATH  Google Scholar 

  96. Locatelli, M., Schoen, F.: Global optimization: theory, algorithms, and applications. Society for Industrial and Applied Mathematics, Philadelphia, PA (2013). https://doi.org/10.1137/1.9781611972672

  97. Magoulas, G., Plagianakos, V., Vrahatis, M.: Hybrid methods using evolutionary algorithms for on-line training. In: International Joint Conference on Neural Networks, 2001 (IJCNN’01) Proceedings, vol. 3, pp. 2218–2223. IEEE (2001)

  98. Martin-Guerreo, J., Gómez-Chova, L., Calpe-Maravilla, J., Camps-Valls, G., Soria-Olivas, E., Moreno, J.: A soft approach to ERA algorithm for hyperspectral image classification. In: Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis, 2003 (ISPA 2003), vol. 2, pp. 761–765. IEEE (2003)

  99. Neelakantan, A., Vilnis, L., Le, Q.V., Sutskever, I., Kaiser, L., Kurach, K., Martens, J.: Adding gradient noise improves learning for very deep networks (2015). arXiv preprint arXiv:1511.06807

  100. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(o(1/k^2)\). Sov. Math. Doklady 27(2), 372–376 (1983)

    MATH  Google Scholar 

  101. Nguyen, Q., Hein, M.: The loss surface and expressivity of deep convolutional neural networks (2017). arXiv preprint arXiv:1710.10928

  102. Nguyen, Q., Hein, M.: The loss surface of deep and wide neural networks (2017). arXiv preprint arXiv:1704.08045

  103. Ojha, V.K., Abraham, A., Snášel, V.: Metaheuristic design of feedforward neural networks: a review of two decades of research. Eng. Appl. Artif. Intell. 60, 97–116 (2017)

    Google Scholar 

  104. Palmes, P.P., Hayasaka, T., Usui, S.: Mutation-based genetic neural network. IEEE Trans. Neural Netw. 16(3), 587–600 (2005)

    Google Scholar 

  105. Peng, C.C., Magoulas, G.D.: Adaptive nonmonotone conjugate gradient training algorithm for recurrent neural networks. In: 19th IEEE International Conference on Tools with Artificial Intelligence, 2007 (ICTAI 2007), vol. 2, pp. 374–381. IEEE (2007)

  106. Peng, C.C., Magoulas, G.D.: Nonmonotone Levenberg–Marquardt training of recurrent neural architectures for processing symbolic sequences. Neural Comput. Appl. 20(6), 897–908 (2011)

    Google Scholar 

  107. Piccialli, V., Sciandrone, M.: Nonlinear optimization and support vector machines. 4OR 16(2), 111–149 (2018)

    MathSciNet  MATH  Google Scholar 

  108. Pintér, J.D.: Calibrating artificial neural networks by global optimization. Expert Syst. Appl. 39(1), 25–32 (2012)

    Google Scholar 

  109. Plagianakos, V., Magoulas, G., Vrahatis, M.: Learning in multilayer perceptrons using global optimization strategies. Nonlinear Anal. Theory Methods Appl. 47(5), 3431–3436 (2001)

    MathSciNet  MATH  Google Scholar 

  110. Plagianakos, V., Magoulas, G., Vrahatis, M.: Improved learning of neural nets through global search. In: Global Optimization, pp. 361–388. Springer (2006)

  111. Plagianakos, V.P., Magoulas, G.D., Vrahatis, M.N.: Deterministic nonmonotone strategies for effective training of multilayer perceptrons. IEEE Transactions on Neural Networks 13(6), 1268–1284 (2002)

    Google Scholar 

  112. Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990)

    MATH  Google Scholar 

  113. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)

    Google Scholar 

  114. Prieto, A., Prieto, B., Ortigosa, E.M., Ros, E., Pelayo, F., Ortega, J., Rojas, I.: Neural networks: an overview of early research, current frameworks and new challenges. Neurocomputing 214, 242–268 (2016)

    Google Scholar 

  115. Rere, L.R., Fanany, M.I., Arymurthy, A.M.: Simulated annealing algorithm for deep learning. Proc. Comput. Sci. 72, 137–144 (2015)

    Google Scholar 

  116. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)

    MathSciNet  MATH  Google Scholar 

  117. RoyChowdhury, P., Singh, Y.P., Chansarkar, R.: Dynamic tunneling technique for efficient training of multilayer perceptrons. IEEE Trans. Neural Netw. 10(1), 48–55 (1999)

    Google Scholar 

  118. Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric regression. In: Cambridge Series in Statistical and Probabilistic mathematics, vol. 12. Mathematical Reviews (MathSciNet): MR1998720. Cambridge Univ. Press, Cambridge (2003)

  119. Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric regression during 2003–2007. Electron. J. Stat. 3, 1193 (2009)

    MathSciNet  MATH  Google Scholar 

  120. Saad, D.: On-Line Learning in Neural Networks, vol. 17. Cambridge University Press, Cambridge (2009)

    MATH  Google Scholar 

  121. Scardapane, S., Wang, D.: Randomness in neural networks: an overview. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 7(2), 1200 (2017)

    Google Scholar 

  122. Schaffer, J.D., Whitley, D., Eshelman, L.J.: Combinations of genetic algorithms and neural networks: a survey of the state of the art. In: International Workshop on Combinations of Genetic Algorithms and Neural Networks, 1992 (COGANN-92), pp. 1–37. IEEE (1992)

  123. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)

    Google Scholar 

  124. Schwenker, F., Kestler, H.A., Palm, G.: Three learning phases for radial-basis-function networks. Neural Netw. 14(4), 439–458 (2001)

    MATH  Google Scholar 

  125. Sexton, R.S., Dorsey, R.E., Johnson, J.D.: Toward global optimization of neural networks: a comparison of the genetic algorithm and backpropagation. Decis. Support Syst. 22(2), 171–185 (1998)

    Google Scholar 

  126. Sexton, R.S., Dorsey, R.E., Johnson, J.D.: Optimization of neural networks: a comparative analysis of the genetic algorithm and simulated annealing. Eur. J. Oper. Res. 114(3), 589–601 (1999)

    MATH  Google Scholar 

  127. Shang, Y., Wah, B.W.: Global optimization for neural network training. Computer 29(3), 45–54 (1996)

    Google Scholar 

  128. Šíma, J.: Training a single sigmoidal neuron is hard. Neural Comput. 14(11), 2709–2728 (2002)

    MATH  Google Scholar 

  129. Soudry, D., Carmon, Y.: No bad local minima: data independent training error guarantees for multilayer neural networks (2016). arXiv preprint arXiv:1605.08361

  130. Sprinkhuizen-Kuyper, I.G., Boers, E.J.: The error surface of the 2-2-1 XOR network: The finite stationary points. Neural Netw. 11(4), 683–690 (1998)

    Google Scholar 

  131. Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  132. Steijvers, M., Grünwald, P.: A recurrent network that performs a context-sensitive prediction task. In: Proceedings of the 18th Annual Conference of the Cognitive Science Society, pp. 335–339 (1996)

  133. Sutskever, I., Martens, J., Dahl, G.E., Hinton, G.E.: On the importance of initialization and momentum in deep learning. ICML 3(28), 1139–1147 (2013)

    Google Scholar 

  134. Swirszcz, G., Czarnecki, W.M., Pascanu, R.: Local minima in training of deep networks. CoRR (2016). arXiv:1611.06310v1

  135. Teboulle, M.: A unified continuous optimization framework for center-based clustering methods. J. Mach. Learn. Res. 8, 65–102 (2007)

    MathSciNet  MATH  Google Scholar 

  136. Teo, C.H., Smola, A., Vishwanathan, S., Le, Q.V.: A scalable modular convex solver for regularized risk minimization. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 727–736. ACM (2007)

  137. Tirumala, S.S., Ali, S., Ramesh, C.P.: Evolving deep neural networks: A new prospect. In: 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 2016, pp. 69–74. IEEE (2016)

  138. Toh, K.A.: Deterministic global optimization for FNN training. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 33(6), 977–983 (2003)

    Google Scholar 

  139. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (2013)

    MATH  Google Scholar 

  140. Voglis, C., Lagaris, I.: A global optimization approach to neural network training. Neural Parallel Sci. Comput. 14(2), 231 (2006)

    MathSciNet  MATH  Google Scholar 

  141. Voglis, C., Lagaris, I.E.: Towards ideal multistart: a stochastic approach for locating the minima of a continuous function inside a bounded domain. Appl. Math. Comput. 213(1), 216–229 (2009)

    MathSciNet  MATH  Google Scholar 

  142. Wang, D.: Editorial: Randomized algorithms for training neural networks. Inf. Sci. 364–365, 126–128 (2016)

    Google Scholar 

  143. Werbos, P.J.: Supervised learning: Can it escape its local minimum? In: Theoretical Advances in Neural Computation and Learning, pp. 449–461. Springer (1994)

  144. Yeung, D.S., Li, J.C., Ng, W.W.Y., Chan, P.P.K.: Mlpnn training via a multiobjective optimization of training error and stochastic sensitivity. IEEE Trans. Neural Netw. Learn. Syst. 27(5), 978–992 (2016). https://doi.org/10.1109/TNNLS.2015.2431251

    MathSciNet  Google Scholar 

  145. Yu, W., Zhuang, F., He, Q., Shi, Z.: Learning deep representations via extreme learning machines. Neurocomputing 149, 308–315 (2015)

    Google Scholar 

  146. Zhang, J.R., Zhang, J., Lok, T.M., Lyu, M.R.: A hybrid particle swarm optimization-back-propagation algorithm for feedforward neural network training. Appl. Math. Comput. 185(2), 1026–1037 (2007)

    MATH  Google Scholar 

Download references

Acknowledgements

Many thanks to two anonymous referees who read carefully the paper and gave useful suggestions that allowed to improve substantially the paper. Thanks to Marianna De Santis and to the Ph.D. students at DIAG who gave their comments on a first version of the paper. Finally I wish to thank prof. Luigi Grippo for pleasant and fruitful conversations on optimization topics, not only about ML, since the time of my Ph.D.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laura Palagi.

Additional information

The author acknowledges support within the project “Distributed optimization algorithms for Big Data” (2017) (No RM11715C7E49E89C) which has received funding from Sapienza, University of Rome.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Palagi, L. Global optimization issues in deep network regression: an overview. J Glob Optim 73, 239–277 (2019). https://doi.org/10.1007/s10898-018-0701-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10898-018-0701-7

Keywords

Navigation