Solving the Local-Minimum Problem in Training Deep Learning Machines

Lo, James Ting-Ho; Gui, Yichuan; Peng, Yun

doi:10.1007/978-3-319-70087-8_18

James Ting-Ho Lo¹⁸,
Yichuan Gui¹⁹ &
Yun Peng¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10634))

Included in the following conference series:

International Conference on Neural Information Processing

4660 Accesses
1 Citations

Abstract

The local-minimum problem in training deep learning machines (DLMs) has plagued their development. This paper proposes a method to directly solve the problem. Our method is based on convexification of the sum squared error (SSE) criterion through transforming the SSE into a risk averting error (RAE) criterion. To alleviate numerical difficulties, a normalized RAE (NRAE) is employed. The convexity region of the SSE expands as its risk sensitivity index (RSI) increases. Making the best use of the convexity region, our method starts training with a very large RSI, gradually reduces it, and switches to the RAE as soon as the RAE is numerically feasible. After training converges, the resultant DLM is expected to be inside the attraction basin of a global minimum of the SSE. Numerical results are provided to show the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layerwise training of deep networks. In: Bernhard, S., Platt, J., Hofmann, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 153–160. MIT Press, Cambridge (2007)
Google Scholar
Choromanska, A., Henaff, M., Mathieu, M., Arous, G., LeCun, Y.: The loss surfaces of multilayer networks. In: arXiv:1412.0233 [cs.LG] (2015)
Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high dimensional nonconvex optimization. In: arXiv:1406.2572 [cs.LG] (2014)
Hinton, G.E.: A practical guide to training restricted boltzmann machines. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012). doi:10.1007/978-3-642-35289-8_32
Chapter Google Scholar
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006). MIT Press, Cambridge, Massachusetts
Article MATH MathSciNet Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: arXiv:1502.03167 [cs.LG] (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. MIT Press, Cambridge (2012)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). Wiley-IEEE Press, Indianapolis, Indiana
Article Google Scholar
Lo, J.: Convexification for data fitting. J. Global Optim. 46(2), 307–315 (2010). Springer, New York
Article MATH MathSciNet Google Scholar
Lo, J., Gui, Y., Peng, Y.: The normalized risk-averting error criterion for avoiding nonglobal local minima in training neural networks. Neurocomputing 149(1), 3–12 (2015). Elsevier, Oxford, UK
Article Google Scholar
Lo, J.: Statistical method of pruning neural networks, In: Proceedings of the 1999 International Joint Conference on Neural Networks, vol. 3, pp. 1678–1680. Wiley-IEEE Press, Indianapolis, Indiana (1999)
Google Scholar
Pascanu, R., Dauphin, Y., Ganguli, S., Bengio, Y.: On the saddle point problem for non-convex optimization. In: arXiv:1405.4604v2 [cs.LG] (2014)
Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. J. Mach. Learn. Res. 5(2), 448–455 (2009). Microtome Publishing, Brookline, Massachusetts
MATH Google Scholar

Download references

Acknowledgements

The work was supported in part by the U.S.A. National Science Foundation under Grant ECCS1028048 and Grant ECCS1508880, but does not necessarily reflect the position or policy of the U.S.A. Government.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Maryland Baltimore County, Baltimore, MD, 21042, USA
James Ting-Ho Lo
Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, MD, 21042, USA
Yichuan Gui & Yun Peng

Authors

James Ting-Ho Lo
View author publications
You can also search for this author in PubMed Google Scholar
Yichuan Gui
View author publications
You can also search for this author in PubMed Google Scholar
Yun Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James Ting-Ho Lo .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lo, J.TH., Gui, Y., Peng, Y. (2017). Solving the Local-Minimum Problem in Training Deep Learning Machines. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10634. Springer, Cham. https://doi.org/10.1007/978-3-319-70087-8_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-70087-8_18
Published: 24 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70086-1
Online ISBN: 978-3-319-70087-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics