Perturbed Proximal Descent to Escape Saddle Points for Non-convex and Non-smooth Objective Functions

  • Zhishen HuangEmail author
  • Stephen Becker
Conference paper
Part of the Proceedings of the International Neural Networks Society book series (INNS, volume 1)


We consider the problem of finding local minimizers in non-convex and non-smooth optimization. Under the assumption of strict saddle points, positive results have been derived for first-order methods. We present the first known results for the non-smooth case, which requires different analysis and a different algorithm. This is the extended version of the paper that contains the proofs.


Saddle-points Proximal gradient descent Non-smooth optimization 


  1. 1.
    Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137, 1–39 (2011)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, New York (2017)CrossRefGoogle Scholar
  3. 3.
    Beck, A.: First-Order Methods in Optimization: MOS-SIAM Series on Optimization. Society for Industrial and Applied Mathematics, Philadelphia (2017)CrossRefGoogle Scholar
  4. 4.
    Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Prog. 146(1–2), 459–494 (2014)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Bot, R.I., Csetnek, E.R., Nguyen, D.-K.: A proximal minimization algorithm for structured nonconvex and nonsmooth problems. arXiv preprint arXiv:1805.11056v1 [math.OC] (2018)
  6. 6.
    Carmon, Y., Duchi, J., Hinder, O., Sidford, A.: Accelerated methods for nonconvex optimization. SIAM J. Optim. 28(2), 1751–1772 (2018)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward-backward splitting. SIAM Multiscale Model. Simul. 4(4), 1168–1200 (2005)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Curtis, F.E., Robinson, D.P., Samadi, M.: A trust region algorithm with a worst-case iteration complexity of \(\cal{O}(\epsilon ^{\frac{3}{2}})\) for nonconvex optimization. Math. Program. 162(1), 1–32 (2017)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Advances in Neural Information Processing Systems, pp. 2933–2941 (2014)Google Scholar
  10. 10.
    Du, S.S., Jin, C., Lee, J.D., Jordan, M.I., Singh, A., Poczos, B.: Gradient descent can take exponential time to escape saddle points. In: Advances in Neural Information Processing Systems, pp. 1067–1077 (2017)Google Scholar
  11. 11.
    Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Comput. 7(2), 219–269 (1995)CrossRefGoogle Scholar
  12. 12.
    Reddi, S.J., Sra, S., Poczos, B., Smola, A.J.: Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. Adv. Neural Inf. Process. Syst. 29, 1145–1153 (2016)Google Scholar
  13. 13.
    Jin, C., Ge, R., Netrapalli, P., Kakade, S.M., Jordan, M.I.: How to escape saddle points efficiently. In: ICML (2017)Google Scholar
  14. 14.
    Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Conference on Learning Theory, pp. 1246–1257 (2016)Google Scholar
  15. 15.
    Liu, Y., Yin, W.: An envelope for Davis-Yin splitting and strict saddle point avoidance. arXiv preprint arXiv:1804.08739 (2018)
  16. 16.
    Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence \(\cal{O}\)(1/\(k^{2}\)). In: Doklady AN SSSR (translated as Soviet Math. Docl.), vol. 269, pp. 543–547 (1983)Google Scholar
  17. 17.
    Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108, 177–205 (2006)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Shor, N.Z.: An application of the method of gradient descent to the solution of the network transportation problem. Materialy Naucnovo Seminara po Teoret i Priklad. Voprosam Kibernet. i Issted. Operacii, Nucnyi Sov. po Kibernet, Akad. Nauk Ukrain. SSSR, vyp 1, 9–17 (1962)Google Scholar
  19. 19.
    Stella, L., Themelis, A., Patrinos, P.: Forward-backward Quasi-Newton methods for nonsmooth optimization problems. Comput. Optim. Appl. 67(3), 443–487 (2017)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Xu, Y., Jin, R., Yang, T.: First-order stochastic algorithms for escaping from saddle points in almost linear time. arXiv preprint (2018). arXiv:1711.01944v3 [math.OC]
  21. 21.
    Zhu, Z., Li, Y.: Neon2: finding local minima via first-order oracles. arXiv preprint (2018). arXiv:1711.06673 [cs.LG]

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Applied MathematicsUniversity of ColoradoBoulderUSA

Personalised recommendations