Abstract
The paper presents an overview of global issues in optimization methods for training feedforward neural networks (FNN) in a regression setting. We first recall the learning optimization paradigm for FNN and we briefly discuss global scheme for the joint choice of the network topologies and of the network parameters. The main part of the paper focuses on the core subproblem which is the continuous unconstrained (regularized) weights optimization problem with the aim of reviewing global methods specifically arising both in multi layer perceptron/deep networks and in radial basis networks. We review some recent results on the existence of non-global stationary points of the unconstrained nonlinear problem and the role of determining a global solution in a supervised learning paradigm. Local algorithms that are widespread used to solve the continuous unconstrained problems are addressed with focus on possible improvements to exploit the global properties. Hybrid global methods specifically devised for FNN training optimization problems which embed local algorithms are discussed too.
Similar content being viewed by others
Notes
Usually the data set is divided into K subsets and the average validation error across all K trials is computed (K-fold cross validation).
The picture has been obtained using a shallow network with \(N=100\) and fixing all the weights to a given random value but the two in input-to-hidden layer
References
Abraham, A.: Meta learning evolutionary artificial neural networks. Neurocomputing 56, 1–38 (2004)
Adam, S., Magoulas, G., Karras, D., Vrahatis, M.: Bounding the search space for global optimization of neural networks learning error: an interval analysis approach. J. Mach. Learn. Res. 17, 1–40 (2016)
Adamu, A., Maul, T., Bargiela, A.: On training neural networks with transfer function diversity. In: International Conference on Computational Intelligence and Information Technology (CIIT 2013), Elsevier (2013)
Amato, S., Apolloni, B., Caporali, G., Madesani, U., Zanaboni, A.: Simulated annealing approach in backpropagation. Neurocomputing 3(5), 207–220 (1991)
An, G.: The effects of adding noise during backpropagation training on a generalization performance. Neural Comput. 8(3), 643–674 (1996)
Bagirov, A., Rubinov, A., Soukhoroukova, N., Yearwood, J.: Unsupervised and supervised data classification via nonsmooth and global optimization. Top 11(1), 1–75 (2003)
Baldi, P., Hornik, K.: Neural networks and principal component analysis: learning from examples without local minima. Neural Netw. 2(1), 53–58 (1989)
Baldi, P., Lu, Z.: Complex-valued autoencoders. Neural Netw. 33, 136–147 (2012)
Baldi, P., Sadowski, P.: The dropout learning algorithm. Artif. Intell. 210, 78–122 (2014)
Barhen, J., Protopopescu, V., Reister, D.: TRUST: a deterministic algorithm for global optimization. Science 276(5315), 1094–1097 (1997)
Bates, D.M., Watts, D.G.: Nonlinear Regression Analysis and Its Applications. Wiley Series in Probability and Statistics. Wiley, Hoboken (2007)
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp. 41–48. ACM (2009)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999)
Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. Optim. Mach. Learn. 2010(1–38), 3 (2011)
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs (1989)
Bertsekas, D.P., Tsitsiklis, J.N.: Gradient convergence in gradient methods with errors. SIAM J. Optim. 10(3), 627–642 (2000)
Bertsimas, D., Dunn, J.: Optimal classification trees. Mach. Learn. 106(7), 1039–1082 (2017). https://doi.org/10.1007/s10994-017-5633-9
Bertsimas, D., Shioda, R.: Classification and regression via integer optimization. Oper. Res. 55(2), 252–271 (2007)
Bianchini, M., Frasconi, P., Gori, M.: Learning without local minima in radial basis function networks. IEEE Trans. Neural Netw. 6(3), 749–756 (1995)
Bishop, C.: Improving the generalization properties of radial basis function neural networks. Neural Comput. 3(4), 579–588 (1991)
Bishop, C.: Pattern Recognition and Machine Learning (Information Science and Statistics), 1st edn. 2006. corr. 2nd printing edn (2007)
Blum, A., Rivest, R.L.: Training a 3-node neural network is NP-complete. In: Proceedings of the 1st International Conference on Neural Information Processing Systems, pp. 494–501. MIT Press (1988)
Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks (2015). arXiv preprint arXiv:1505.05424
Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS’07, pp. 161–168. Curran Associates Inc., USA (2007). http://dl.acm.org/citation.cfm?id=2981562.2981583
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
Boubezoul, A., Paris, S.: Application of global optimization methods to model and feature selection. Pattern Recognit. 45(10), 3676–3686 (2012)
Branke, J.: Evolutionary algorithms for neural network design and training. In: Proceedings of the First Nordic Workshop on Genetic Algorithms and its Applications, pp. 145–163 (1995)
Bravi, L., Piccialli, V., Sciandrone, M.: An optimization-based method for feature ranking in nonlinear regression problems. IEEE Trans. Neural Netw. Learn. Syst. 28(4), 1005–1010 (2017)
Bray, A.J., Dean, D.S.: Statistics of critical points of Gaussian fields on large-dimensional spaces. Phys. Rev. Lett. 98(15), 150 201 (2007)
Breuel, T.M.: On the convergence of SGD training of neural networks (2015). arXiv preprint arXiv:1508.02790
Buchtala, O., Klimek, M., Sick, B.: Evolutionary optimization of radial basis function classifiers for data mining applications. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 35(5), 928–947 (2005)
Burges, C.J.: A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2(2), 121–167 (1998)
Buzzi, C., Grippo, L., Sciandrone, M.: Convergent decomposition techniques for training RBF neural networks. Neural Comput. 13(8), 1891–1920 (2001)
Carrizosa, E., Martín-Barragán, B., Morales, D.R.: A nested heuristic for parameter tuning in support vector machines. Comput. Oper. Res. 43, 328–334 (2014)
Carrizosa, E., Morales, D.R.: Supervised classification and mathematical optimization. Comput. Oper. Res. 40(1), 150–165 (2013)
Cetin, B., Barhen, J., Burdick, J.: Terminal repeller unconstrained subenergy tunneling ( trust) for fast global optimization. J. Optim. Theory Appl. 77(1), 97–126 (1993)
Cetin, B.C., Burdick, J.W., Barhen, J.: Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks. In: IEEE International Conference onNeural Networks, 1993, pp. 836–842. IEEE (1993)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Chao, J., Hoshino, M., Kitamura, T., Masuda, T.: A multilayer RBF network and its supervised learning. In: International Joint Conference on Neural Networks, 2001 (IJCNN’01), Proceedings, vol. 3, pp. 1995–2000. IEEE (2001)
Chapelle, O., Sindhwani, V., Keerthi, S.S.: Optimization techniques for semi-supervised support vector machines. J. Mach. Learn. Res. 9, 203–233 (2008)
Chen, S., Wu, Y., Luk, B.: Combined genetic algorithm optimization and regularized orthogonal least squares learning for radial basis function networks. IEEE Trans. Neural Netw. 10(5), 1239–1243 (1999)
Chiang, H.D., Reddy, C.K.: TRUST-TECH based neural network training. In: International Joint Conference on Neural Networks, 2007. (IJCNN 2007), pp. 90–95. IEEE (2007)
Cho, Sy, Chow, T.W.: Training multilayer neural networks using fast global learning algorithm—least-squares and penalized optimization methods. Neurocomputing 25(1), 115–131 (1999)
Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., LeCun, Y.: The loss surfaces of multilayer networks. In: AISTATS (2015)
Choromanska, A., LeCun, Y., Arous, G.B.: Open problem: the landscape of the loss surfaces of multilayer networks. In: COLT, pp. 1756–1760 (2015)
Cohen, S., Intrator, N.: Global optimization of RBF networks (2000). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.5955
Cohen, S., Intrator, N.: A hybrid projection-based and radial basis function architecture: initial values and global optimisation. Pattern Anal. Appl. 5(2), 113–120 (2002)
Dai, Q., Ma, Z., Xie, Q.: A two-phased and ensemble scheme integrated backpropagation algorithm. Appl. Soft Comput. 24, 1124–1135 (2014)
Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Advances in neural information processing systems, pp. 2933–2941 (2014)
David, O.E., Greental, I.: Genetic algorithms for evolving deep neural networks. In: Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 1451–1452. ACM (2014)
Dietterich, T.G.: Ensemble methods in machine learning. In: International workshop on multiple classifier systems, pp. 1–15. Springer (2000)
Duarte Silva, A.P.: Optimization approaches to supervised classification. Eur. J. Oper. Res. 261(2), 772–788 (2017)
Duch, W., Jankowski, N.: New neural transfer functions. Appl. Math. Comput. Sci. 7, 639–658 (1997)
Duch, W., Jankowski, N.: Survey of neural transfer functions. Neural Comput. Surv. 2(1), 163–212 (1999)
Duch, W., Korczak, J.: Optimization and global minimization methods suitable for neural networks. Neural Comput. Surv. 2, 163–212 (1998)
Feng-wen, H., Ai-ping, J.: An improved method of wavelet neural network optimization based on filled function method. In: 16th International Conference on Industrial Engineering and Engineering Management, 2009 (IE&EM’09), pp. 1694–1697. IEEE (2009)
Fischetti, M.: Fast training of support vector machines with gaussian kernel. Discrete Optim. 22, 183–194 (2016)
Floudas, C.A.: Deterministic Global Optimization: Theory, Methods and Applications, vol. 37. Springer, Berlin (2013)
Fukumizu, K., Amari, Si: Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural Netw. 13(3), 317–327 (2000)
Ge, R.: A filled function method for finding a global minimizer of a function of several variables. Math. Program. 46(1–3), 191–204 (1990)
González, J., Rojas, I., Ortega, J., Pomares, H., Fernandez, F.J., Díaz, A.F.: Multiobjective evolutionary optimization of the size, shape, and position parameters of radial basis function networks for function approximation. IEEE Trans. Neural Netw. 14(6), 1478–1495 (2003)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Goodfellow, I.J., Vinyals, O.: Qualitatively characterizing neural network optimization problems. CoRR (2014). http://arxiv.org/abs/1412.6544
Gori, M., Tesi, A.: On the problem of local minima in backpropagation. IEEE Trans. Pattern Anal. Mach. Intell. 14(1), 76–86 (1992)
Gorse, D., Shepherd, A.J., Taylor, J.G.: Avoiding local minima by a classical range expansion algorithm. In: ICANN94, pp. 525–528. Springer, London (1994)
Gorse, D., Shepherd, A.J., Taylor, J.G.: A classical algorithm for avoiding local minima. In: Proceedings of the World Congress on Neural Networks, pp. 364–369. Citeseer (1994)
Gorse, D., Shepherd, A.J., Taylor, J.G.: The new ERA in supervised learning. Neural Netw. 10(2), 343–352 (1997)
Graves, A.: Practical variational inference for neural networks. In: Advances in Neural Information Processing Systems, pp. 2348–2356 (2011)
Grippo, L.: Convergent on-line algorithms for supervised learning in neural networks. IEEE Trans. Neural Netw. 11(6), 1284–1299 (2000)
Grippo, L., Manno, A., Sciandrone, M.: Decomposition techniques for multilayer perceptron training. IEEE Trans. Neural Netw. Learn. Syst. 27(11), 2146–2159 (2016)
Grippo, L., Sciandrone, M.: Globally convergent block-coordinate techniques for unconstrained optimization. Optim. Methods Softw. 10(4), 587–637 (1999)
Grippo, L., Sciandrone, M.: Nonmonotone globalization techniques for the Barzilai–Borwein gradient method. Comput. Optim. Appl. 23(2), 143–169 (2002)
Györfi, L., Kohler, M., Krzyzak, A., Walk, H.: A Distribution-free Theory of Nonparametric Regression. Springer, Berlin (2006)
Hamey, L.G.: XOR has no local minima: a case study in neural network error surface analysis. Neural Netw. 11(4), 669–681 (1998)
Hamm, L., Brorsen, B.W., Hagan, M.T.: Comparison of stochastic global optimization methods to estimate neural network weights. Neural Process. Lett. 26(3), 145–158 (2007)
Haykin, S.: Neural Networks and Learning Machines, vol. 3. Pearson, Upper Saddle River (2009)
Hochreiter, S., Schmidhuber, J.: Flat minima. Neural Comput. 9(1), 1–42 (1997)
Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches. Springer, Berlin (2013)
Huang, G., Huang, G.B., Song, S., You, K.: Trends in extreme learning machines: a review. Neural Netw. 61, 32–48 (2015)
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings, vol. 2, pp. 985–990. IEEE (2004)
Hui, L.C.K., Lam, K.Y., Chea, C.W.: Global optimisation in neural network training. Neural Comput. Appl. 5(1), 58–64 (1997)
Jin, Y., Sendhoff, B.: Pareto-based multiobjective machine learning: an overview and case studies. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 38(3), 397–415 (2008)
Kawaguchi, K.: Deep learning without poor local minima. In: Advances In Neural Information Processing Systems, pp. 586–594 (2016)
Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. In: ICLR 2017 (2016)
Lang, K.: Learning to tell two spiral apart. In: Proceedings of the 1988 Connectionist Models Summer School, pp. 52–59 (1989)
Laurent, T., von Brecht, J.: The multilinear structure of ReLU networks (2017). arXiv preprint arXiv:1712.10132
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.R.: Efficient backprop. In: Neural networks: Tricks of the trade, pp. 9–48. Springer (2012)
Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Conference on Learning Theory, pp. 1246–1257 (2016)
Lee, J.S., Park, C.H.: Global optimization of radial basis function networks by hybrid simulated annealing. Neural Netw. World 20(4), 519 (2010)
Li, H.R., Li, H.L.: A global optimization algorithm based on filled-function for neural networks. J. Northeast. Univ. Nat. Sci. 28(9), 1247 (2007)
Lin, S.W., Tseng, T.Y., Chou, S.Y., Chen, S.C.: A simulated-annealing-based approach for simultaneous parameter optimization and feature selection of back-propagation networks. Expert Syst. Appl. 34(2), 1491–1499 (2008)
Lisboa, P., Perantonis, S.: Complete solution of the local minima in the XOR problem. Network: Comput. Neural Syst. 2(1), 119–124 (1991)
Liu, H., Wang, Y., Guan, S., Liu, X.: A new filled function method for unconstrained global optimization. Int. J. Comput. Math. 94(12), 2283–2296 (2017)
Locatelli, M., Schoen, F.: Global optimization: theory, algorithms, and applications. Society for Industrial and Applied Mathematics, Philadelphia, PA (2013). https://doi.org/10.1137/1.9781611972672
Magoulas, G., Plagianakos, V., Vrahatis, M.: Hybrid methods using evolutionary algorithms for on-line training. In: International Joint Conference on Neural Networks, 2001 (IJCNN’01) Proceedings, vol. 3, pp. 2218–2223. IEEE (2001)
Martin-Guerreo, J., Gómez-Chova, L., Calpe-Maravilla, J., Camps-Valls, G., Soria-Olivas, E., Moreno, J.: A soft approach to ERA algorithm for hyperspectral image classification. In: Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis, 2003 (ISPA 2003), vol. 2, pp. 761–765. IEEE (2003)
Neelakantan, A., Vilnis, L., Le, Q.V., Sutskever, I., Kaiser, L., Kurach, K., Martens, J.: Adding gradient noise improves learning for very deep networks (2015). arXiv preprint arXiv:1511.06807
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(o(1/k^2)\). Sov. Math. Doklady 27(2), 372–376 (1983)
Nguyen, Q., Hein, M.: The loss surface and expressivity of deep convolutional neural networks (2017). arXiv preprint arXiv:1710.10928
Nguyen, Q., Hein, M.: The loss surface of deep and wide neural networks (2017). arXiv preprint arXiv:1704.08045
Ojha, V.K., Abraham, A., Snášel, V.: Metaheuristic design of feedforward neural networks: a review of two decades of research. Eng. Appl. Artif. Intell. 60, 97–116 (2017)
Palmes, P.P., Hayasaka, T., Usui, S.: Mutation-based genetic neural network. IEEE Trans. Neural Netw. 16(3), 587–600 (2005)
Peng, C.C., Magoulas, G.D.: Adaptive nonmonotone conjugate gradient training algorithm for recurrent neural networks. In: 19th IEEE International Conference on Tools with Artificial Intelligence, 2007 (ICTAI 2007), vol. 2, pp. 374–381. IEEE (2007)
Peng, C.C., Magoulas, G.D.: Nonmonotone Levenberg–Marquardt training of recurrent neural architectures for processing symbolic sequences. Neural Comput. Appl. 20(6), 897–908 (2011)
Piccialli, V., Sciandrone, M.: Nonlinear optimization and support vector machines. 4OR 16(2), 111–149 (2018)
Pintér, J.D.: Calibrating artificial neural networks by global optimization. Expert Syst. Appl. 39(1), 25–32 (2012)
Plagianakos, V., Magoulas, G., Vrahatis, M.: Learning in multilayer perceptrons using global optimization strategies. Nonlinear Anal. Theory Methods Appl. 47(5), 3431–3436 (2001)
Plagianakos, V., Magoulas, G., Vrahatis, M.: Improved learning of neural nets through global search. In: Global Optimization, pp. 361–388. Springer (2006)
Plagianakos, V.P., Magoulas, G.D., Vrahatis, M.N.: Deterministic nonmonotone strategies for effective training of multilayer perceptrons. IEEE Transactions on Neural Networks 13(6), 1268–1284 (2002)
Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Prieto, A., Prieto, B., Ortigosa, E.M., Ros, E., Pelayo, F., Ortega, J., Rojas, I.: Neural networks: an overview of early research, current frameworks and new challenges. Neurocomputing 214, 242–268 (2016)
Rere, L.R., Fanany, M.I., Arymurthy, A.M.: Simulated annealing algorithm for deep learning. Proc. Comput. Sci. 72, 137–144 (2015)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
RoyChowdhury, P., Singh, Y.P., Chansarkar, R.: Dynamic tunneling technique for efficient training of multilayer perceptrons. IEEE Trans. Neural Netw. 10(1), 48–55 (1999)
Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric regression. In: Cambridge Series in Statistical and Probabilistic mathematics, vol. 12. Mathematical Reviews (MathSciNet): MR1998720. Cambridge Univ. Press, Cambridge (2003)
Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric regression during 2003–2007. Electron. J. Stat. 3, 1193 (2009)
Saad, D.: On-Line Learning in Neural Networks, vol. 17. Cambridge University Press, Cambridge (2009)
Scardapane, S., Wang, D.: Randomness in neural networks: an overview. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 7(2), 1200 (2017)
Schaffer, J.D., Whitley, D., Eshelman, L.J.: Combinations of genetic algorithms and neural networks: a survey of the state of the art. In: International Workshop on Combinations of Genetic Algorithms and Neural Networks, 1992 (COGANN-92), pp. 1–37. IEEE (1992)
Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)
Schwenker, F., Kestler, H.A., Palm, G.: Three learning phases for radial-basis-function networks. Neural Netw. 14(4), 439–458 (2001)
Sexton, R.S., Dorsey, R.E., Johnson, J.D.: Toward global optimization of neural networks: a comparison of the genetic algorithm and backpropagation. Decis. Support Syst. 22(2), 171–185 (1998)
Sexton, R.S., Dorsey, R.E., Johnson, J.D.: Optimization of neural networks: a comparative analysis of the genetic algorithm and simulated annealing. Eur. J. Oper. Res. 114(3), 589–601 (1999)
Shang, Y., Wah, B.W.: Global optimization for neural network training. Computer 29(3), 45–54 (1996)
Šíma, J.: Training a single sigmoidal neuron is hard. Neural Comput. 14(11), 2709–2728 (2002)
Soudry, D., Carmon, Y.: No bad local minima: data independent training error guarantees for multilayer neural networks (2016). arXiv preprint arXiv:1605.08361
Sprinkhuizen-Kuyper, I.G., Boers, E.J.: The error surface of the 2-2-1 XOR network: The finite stationary points. Neural Netw. 11(4), 683–690 (1998)
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Steijvers, M., Grünwald, P.: A recurrent network that performs a context-sensitive prediction task. In: Proceedings of the 18th Annual Conference of the Cognitive Science Society, pp. 335–339 (1996)
Sutskever, I., Martens, J., Dahl, G.E., Hinton, G.E.: On the importance of initialization and momentum in deep learning. ICML 3(28), 1139–1147 (2013)
Swirszcz, G., Czarnecki, W.M., Pascanu, R.: Local minima in training of deep networks. CoRR (2016). arXiv:1611.06310v1
Teboulle, M.: A unified continuous optimization framework for center-based clustering methods. J. Mach. Learn. Res. 8, 65–102 (2007)
Teo, C.H., Smola, A., Vishwanathan, S., Le, Q.V.: A scalable modular convex solver for regularized risk minimization. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 727–736. ACM (2007)
Tirumala, S.S., Ali, S., Ramesh, C.P.: Evolving deep neural networks: A new prospect. In: 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 2016, pp. 69–74. IEEE (2016)
Toh, K.A.: Deterministic global optimization for FNN training. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 33(6), 977–983 (2003)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (2013)
Voglis, C., Lagaris, I.: A global optimization approach to neural network training. Neural Parallel Sci. Comput. 14(2), 231 (2006)
Voglis, C., Lagaris, I.E.: Towards ideal multistart: a stochastic approach for locating the minima of a continuous function inside a bounded domain. Appl. Math. Comput. 213(1), 216–229 (2009)
Wang, D.: Editorial: Randomized algorithms for training neural networks. Inf. Sci. 364–365, 126–128 (2016)
Werbos, P.J.: Supervised learning: Can it escape its local minimum? In: Theoretical Advances in Neural Computation and Learning, pp. 449–461. Springer (1994)
Yeung, D.S., Li, J.C., Ng, W.W.Y., Chan, P.P.K.: Mlpnn training via a multiobjective optimization of training error and stochastic sensitivity. IEEE Trans. Neural Netw. Learn. Syst. 27(5), 978–992 (2016). https://doi.org/10.1109/TNNLS.2015.2431251
Yu, W., Zhuang, F., He, Q., Shi, Z.: Learning deep representations via extreme learning machines. Neurocomputing 149, 308–315 (2015)
Zhang, J.R., Zhang, J., Lok, T.M., Lyu, M.R.: A hybrid particle swarm optimization-back-propagation algorithm for feedforward neural network training. Appl. Math. Comput. 185(2), 1026–1037 (2007)
Acknowledgements
Many thanks to two anonymous referees who read carefully the paper and gave useful suggestions that allowed to improve substantially the paper. Thanks to Marianna De Santis and to the Ph.D. students at DIAG who gave their comments on a first version of the paper. Finally I wish to thank prof. Luigi Grippo for pleasant and fruitful conversations on optimization topics, not only about ML, since the time of my Ph.D.
Author information
Authors and Affiliations
Corresponding author
Additional information
The author acknowledges support within the project “Distributed optimization algorithms for Big Data” (2017) (No RM11715C7E49E89C) which has received funding from Sapienza, University of Rome.
Rights and permissions
About this article
Cite this article
Palagi, L. Global optimization issues in deep network regression: an overview. J Glob Optim 73, 239–277 (2019). https://doi.org/10.1007/s10898-018-0701-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-018-0701-7