Mathematical Programming

, Volume 176, Issue 1–2, pp 39–67 | Cite as

Run-and-Inspect Method for nonconvex optimization and global optimality bounds for R-local minimizers

  • Yifan Chen
  • Yuejiao Sun
  • Wotao YinEmail author
Full Length Paper Series B


Many optimization algorithms converge to stationary points. When the underlying problem is nonconvex, they may get trapped at local minimizers or stagnate near saddle points. We propose the Run-and-Inspect Method, which adds an “inspect” phase to existing algorithms that helps escape from non-global stationary points. It samples a set of points in a radius R around the current point. When a sample point yields a sufficient decrease in the objective, we resume an existing algorithm from that point. If no sufficient decrease is found, the current point is called an approximate R-local minimizer. We show that an R-local minimizer is globally optimal, up to a specific error depending on R, if the objective function can be implicitly decomposed into a smooth convex function plus a restricted function that is possibly nonconvex, nonsmooth. Therefore, for such nonconvex objective functions, verifying global optimality is fundamentally easier. For high-dimensional problems, we introduce blockwise inspections to overcome the curse of dimensionality while still maintaining optimality bounds up to a factor equal to the number of blocks. We also present the sample complexities of these methods. When we apply our method to the existing algorithms on a set of artificial and realistic nonconvex problems, we find significantly improved chances of obtaining global minima.


R-local minimizer Run-and-Inspect Method Nonconvex optimization Global minimum Global optimality 

Mathematics Subject Classification

90C26 90C30 49M30 65K05 



  1. 1.
    Chaudhari, P., Choromanska, A., Soatto, S., LeCun, Y.: Entropy-SGD: biasing gradient descent into wide valleys. arXiv preprint arXiv:1611.01838 (2016)
  2. 2.
    Chaudhari, P., Oberman, A., Osher, S., Soatto, S., Carlier, G.: Deep relaxation: partial differential equations for optimizing deep neural networks. Res. Math. Sci. 5(3), 30 (2018)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Conn, A.R., Gould, N.I., Toint, P.L.: Trust Region Methods. SIAM, Philadelphia (2000)CrossRefzbMATHGoogle Scholar
  4. 4.
    Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimization. No. 8 in MPS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics/Mathematical Programming Society, Philadelphia (2009)CrossRefGoogle Scholar
  5. 5.
    Fox, J.: An R and S-Plus Companion to Applied Regression. Sage Publications, Thousand Oaks (2002)Google Scholar
  6. 6.
    Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points—online stochastic gradient for tensor decomposition. In: Conference on Learning Theory, pp. 797–842 (2015)Google Scholar
  7. 7.
    Ge, R., Lee, J.D., Ma, T.: Matrix completion has no spurious local minimum. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, pp. 2973–2981. Curran Associates, Inc. (2016)Google Scholar
  8. 8.
    Ghadimi, S., Lan, G.: Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Program. 156(1–2), 59–99 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently. arXiv preprint arXiv:1703.00887 (2017)
  10. 10.
    Karimi, H., Nutini, J., Schmidt, M.: Linear convergence of gradient and proximal-gradient methods under the Polyak–Łojasiewicz condition. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) Machine Learning and Knowledge Discovery in Databases, vol. 9851, pp. 795–811. Springer, Cham (2016)CrossRefGoogle Scholar
  11. 11.
    Kirkpatrick, S., Gelatt Jr., C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983). MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Martínez, J.M., Raydan, M.: Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization. J. Glob. Optim. 68(2), 367–385 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Panageas, I., Piliouras, G.: Gradient descent converges to minimizers: the case of non-isolated critical points. CoRR arXiv:1605.00405 (2016)
  15. 15.
    Pascanu, R., Dauphin, Y.N., Ganguli, S., Bengio, Y.: On the saddle point problem for non-convex optimization. arXiv preprint arXiv:1405.4604 (2014)
  16. 16.
    Peng, Z., Wu, T., Xu, Y., Yan, M., Yin, W.: Coordinate friendly structures, algorithms and applications. Ann. Math. Sci. Appl. 1(1), 57–119 (2016)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Polyak, B.T.: Gradient methods for minimizing functionals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 3(4), 643–653 (1963)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Reddi, S.J., Hefny, A., Sra, S., Poczos, B., Smola, A.: Stochastic variance reduction for nonconvex optimization. In: International Conference on Machine Learning, pp. 314–323 (2016)Google Scholar
  19. 19.
    Sagun, L., Bottou, L., LeCun, Y.: Singularity of the Hessian in deep learning. arXiv preprint arXiv:1611.07476 (2016)
  20. 20.
    Shen, X., Gu, Y.: Nonconvex sparse logistic regression with weakly convex regularization. IEEE Trans. Signal Process. 66(12), 3199–3211 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Sun, J., Qu, Q., Wright, J.: Complete dictionary recovery over the sphere. In: 2015 International Conference on Sampling Theory and Applications (SampTA), pp. 407–410. IEEE (2015)Google Scholar
  22. 22.
    Sun, J., Qu, Q., Wright, J.: A geometric analysis of phase retrieval. In: 2016 IEEE International Symposium on Information Theory (ISIT), pp. 2379–2383. IEEE (2016)Google Scholar
  23. 23.
    Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 78, 29–63 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Wu, L., Zhu, Z., Weinan, E.: Towards understanding generalization of deep learning: perspective of loss landscapes. arXiv preprint arXiv:1706.10239 (2017)
  25. 25.
    Xu, Y., Yin, W.: A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 6(3), 1758–1789 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Xu, Z., Chang, X., Xu, F., Zhang, H.: \(l_{1/2}\) regularization: a thresholding representation theory and a fast solver. IEEE Trans. Neural Netw. Learn. Syst. 23(7), 1013–1027 (2012)CrossRefGoogle Scholar
  27. 27.
    Yin, P., Pham, M., Oberman, A., Osher, S.: Stochastic backward Euler: an implicit gradient descent algorithm for k-means clustering. J. Sci. Comput. 77(2), 1133–1146 (2018)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Zeng, J., Peng, Z., Lin, S.: GAITA: a Gauss–Seidel iterative thresholding algorithm for \(\ell _q\) regularized least squares regression. J. Comput. Appl. Math. 319, 220–235 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature and Mathematical Optimization Society 2019

Authors and Affiliations

  1. 1.Department of Mathematical SciencesTsinghua UniversityBeijingChina
  2. 2.Department of MathematicsUniversity of CaliforniaLos AngelesUSA

Personalised recommendations