Computational Optimization and Applications

, Volume 72, Issue 1, pp 115–157 | Cite as

Structured nonconvex and nonsmooth optimization: algorithms and iteration complexity analysis

  • Bo JiangEmail author
  • Tianyi Lin
  • Shiqian Ma
  • Shuzhong Zhang


Nonconvex and nonsmooth optimization problems are frequently encountered in much of statistics, business, science and engineering, but they are not yet widely recognized as a technology in the sense of scalability. A reason for this relatively low degree of popularity is the lack of a well developed system of theory and algorithms to support the applications, as is the case for its convex counterpart. This paper aims to take one step in the direction of disciplined nonconvex and nonsmooth optimization. In particular, we consider in this paper some constrained nonconvex optimization models in block decision variables, with or without coupled affine constraints. In the absence of coupled constraints, we show a sublinear rate of convergence to an \(\epsilon \)-stationary solution in the form of variational inequality for a generalized conditional gradient method, where the convergence rate is dependent on the Hölderian continuity of the gradient of the smooth part of the objective. For the model with coupled affine constraints, we introduce corresponding \(\epsilon \)-stationarity conditions, and apply two proximal-type variants of the ADMM to solve such a model, assuming the proximal ADMM updates can be implemented for all the block variables except for the last block, for which either a gradient step or a majorization–minimization step is implemented. We show an iteration complexity bound of \(O(1/\epsilon ^2)\) to reach an \(\epsilon \)-stationary solution for both algorithms. Moreover, we show that the same iteration complexity of a proximal BCD method follows immediately. Numerical results are provided to illustrate the efficacy of the proposed algorithms for tensor robust PCA and tensor sparse PCA problems.


Structured nonconvex optimization \(\epsilon \)-Stationary Iteration complexity Conditional gradient method Alternating direction method of multipliers Block coordinate descent method 

Mathematics Subject Classification

90C26 90C06 90C60 



We would like to thank Professor Renato D. C. Monteiro and two anonymous referees for their insightful comments, which helped improve this paper significantly.


  1. 1.
    Allen, G.: Sparse higher-order principal components analysis. In: The 15th International Conference on Artificial Intelligence and Statistics (2012)Google Scholar
  2. 2.
    Ames, B., Hong, M.: Alternating direction method of multipliers for penalized zero-variance discriminant analysis. Comput. Optim. Appl. 64(3), 725–754 (2016). MathSciNetzbMATHGoogle Scholar
  3. 3.
    Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka–Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Optim. 25(1), 115–129 (2015)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Beck, A., Shtern, S.: Linearly convergent away-step conditional gradient for nonstrongly convex functions. Math. Program. 164(1–2), 1–27 (2017)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Bian, W., Chen, X.: Worst-case complexity of smoothing quadratic regularization methods for non-Lipschitzian optimization. SIAM J. Optim. 23, 1718–1741 (2013)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Bian, W., Chen, X., Ye, Y.: Complexity analysis of interior point algorithms for non-Lipschitz and nonconvex minimization. Math. Program. 149, 301–327 (2015)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17, 1205–1223 (2006)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18, 556–572 (2007)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)zbMATHGoogle Scholar
  11. 11.
    Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494 (2014)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)zbMATHGoogle Scholar
  13. 13.
    Bredies, K.: A forward-backward splitting algorithm for the minimization of non-smooth convex functionals in Banach space. Inverse Probl. 25(1), 711–723 (2009)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Bredies, K., Lorenz, D.A., Maass, P.: A generalized conditional gradient method and its connection to an iterative shrinkage method. Comput. Optim. Appl. 42(2), 173–193 (2009)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Candès, E.J., Wakin, M.B., Boyd, S.P.: Enhancing sparsity by reweighted \(\ell _1\) minimization. J. Fourier Anal. Appl. 14(5–6), 877–905 (2008)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Cartis, C., Gould, N.I.M., Toint, PhL: On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization. SIAM J. Optim. 20(6), 2833–2852 (2010)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Cartis, C., Gould, N.I.M., Toint, P.L.: Adaptive cubic overestimation methods for unconstrained optimization. Part II: worst-case function-evaluation complexity. Math. Program. Ser. A 130(2), 295–319 (2011)zbMATHGoogle Scholar
  18. 18.
    Cartis, C., Gould, N.I.M., Toint, P.L.: An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity. IMA J. Numer. Anal. 32, 1662–1695 (2012)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Chen, X., Ge, D., Wang, Z., Ye, Y.: Complexity of unconstrained \(l_2\)-\(l_p\) minimization. Math. Program. 143, 371–383 (2014)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Curtis, F., Robinson, D.P., Samadi, M.: A trust region algorithm with a worst-case iteration complexity of \({\cal{O}} (\epsilon ^{-3/2})\) for nonconvex optimization. Math. Program. 162, 1–32 (2017)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Devolder, O., François, G., Nesterov, Yu.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. Ser. A 146, 37–75 (2014)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Dutta, J., Deb, K., Tulshyan, R., Arora, R.: Approximate KKT points and a proximity measure for termination. J. Glob. Optim. 56, 1463–1499 (2013)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Frank, M., Wolfe, P.: An algorithm for quadratic programming. Nav. Res. Logist. Q. 3, 95–110 (1956)MathSciNetGoogle Scholar
  25. 25.
    Freund, R.M., Grigas, P.: New analysis and results for the Frank–Wolfe method. Math. Program. 155, 199–230 (2016)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Gao, X., Jiang, B., Zhang, S.: On the information-adaptive variants of the ADMM: an iteration complexity perspective. J. Sci. Comput. 76, 327–363 (2018)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Ge, D., He, R., He, S.: A three criteria algorithm for \(l_2-l_p\) minimization problem with linear constraints. Math. Program. 166(1), 131–158 (2017)MathSciNetGoogle Scholar
  28. 28.
    Ghadimi, S., Lan, G., Zhang, H.: Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Program. 155(1), 1–39 (2016)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Gong, P., Zhang, C., Lu, Z., Huang, J., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: ICML, pp. 37–45 (2013)Google Scholar
  30. 30.
    Harchaoui, Z., Juditsky, A., Nemirovski, A.: Conditional gradient algorithms for norm-regularized smooth convex optimization. Math. Program. 152, 75–112 (2015)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Hong, M.: A distributed, asynchronous and incremental algorithm for nonconvex optimization: an ADMM based approach. IEEE Trans. Control Netw. Syst. 5(3), 935–945 (2018)MathSciNetGoogle Scholar
  32. 32.
    Hong, M.: Decomposing linearly constrained nonconvex problems by a proximal primal dual approach: algorithms, convergence, and applications. arXiv:1604.00543 (2016)
  33. 33.
    Hong, M., Luo, Z.-Q., Razaviyayn, M.M.: Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems. SIAM J. Optim. 26(1), 337–364 (2016)MathSciNetzbMATHGoogle Scholar
  34. 34.
    Jaggi, M.: Revisiting Frank–Wolfe: projection-free sparse convex optimization. In: ICML (2013)Google Scholar
  35. 35.
    Jiang, B., Yang, F., Zhang, S.: Tensor and its Tucker core: the invariance relationships. Numer. Linear Algebra Appl. 24(3), e2086 (2017)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Kurdyka, K.: On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier 146, 769–783 (1998)MathSciNetzbMATHGoogle Scholar
  37. 37.
    Lan, G., Zhou, Y.: Conditional gradient sliding for convex optimization. SIAM J. Optim. 26(2), 1379–1409 (2016)MathSciNetzbMATHGoogle Scholar
  38. 38.
    Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25(4), 2434–2460 (2015)MathSciNetzbMATHGoogle Scholar
  39. 39.
    Lin, T., Ma, S., Zhang, S.: Global convergence of unmodified 3-block ADMM for a class of convex minimization problems. J. Sci. Comput 76, 69–88 (2018)MathSciNetzbMATHGoogle Scholar
  40. 40.
    Lin, T., Ma, S., Zhang, S.: Iteration complexity analysis of multi-block ADMM for a family of convex minimization without strong convexity. J. Sci. Comput. 69(1), 52–81 (2016)MathSciNetzbMATHGoogle Scholar
  41. 41.
    Liu, Y., Ma, S., Dai, Y., Zhang, S.: A smoothing SQP framework for a class of composite \(\ell _q\) minimization over polyhedron. Math. Program. Ser. A 158(1), 467–500 (2016)zbMATHGoogle Scholar
  42. 42.
    Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels, Les Équations aux Dérivées Partielles. Éditions du centre National de la Recherche Scientifique, Paris (1963)Google Scholar
  43. 43.
    Lacoste-Julien, S.: Convergence rate of Frank–Wolfe for non-convex objectives. Preprint arXiv:1607.00345 (2016)
  44. 44.
    Lafond, J., Wai, H.-T., Moulines, E.: On the Online Frank–Wolfe algorithms for convex and non-convex optimizations. Preprint arXiv:1510.01171
  45. 45.
    Luss, R., Teboulle, M.: Conditional gradient algorithms for rank one matrix approximations with a sparsity constraint. SIAM Rev. 55, 65–98 (2013)MathSciNetzbMATHGoogle Scholar
  46. 46.
    Martınez, J.M., Raydan, M.: Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization. J. Glob. Optim. 68, 367–385 (2017)MathSciNetzbMATHGoogle Scholar
  47. 47.
    Mu, C., Zhang, Y., Wright, J., Goldfarb, D.: Scalable robust matrix recovery: Frank–Wolfe meets proximal methods. SIAM J. Sci. Comput. 38(5), 3291–3317 (2016)MathSciNetzbMATHGoogle Scholar
  48. 48.
    Nesterov, Y.: Introductory Lectures on Convex Optimization. Applied Optimization. Kluwer Academic Publishers, Boston, MA (2004)zbMATHGoogle Scholar
  49. 49.
    Ngai, H.V., Luc, D.T., Théra, M.: Extensions of Fréchet \(\epsilon \)-subdifferential calculus and applications. J. Math. Anal. Appl. 268, 266–290 (2002)MathSciNetzbMATHGoogle Scholar
  50. 50.
    Rockafellar, R.T., Wets, R.: Variational Analysis. Volume 317 of Grundlehren der Mathematischen Wissenschafte. Springer, Berlin (1998)zbMATHGoogle Scholar
  51. 51.
    Shen, Y., Wen, Z., Zhang, Y.: Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization. Optim. Methods Softw. 29(2), 239–263 (2014)MathSciNetzbMATHGoogle Scholar
  52. 52.
    Wang, F., Cao, W., Xu, Z.: Convergence of multiblock Bregman ADMM for nonconvex composite problems. Preprint arXiv:1505.03063 (2015)
  53. 53.
    Wang, Y., Yin, W., Zeng, J.: Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 1–35 (2018)Google Scholar
  54. 54.
    Wen, Z., Yin, W., Zhang, Y.: Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math. Program. Comput. 4(4), 333–361 (2012)MathSciNetzbMATHGoogle Scholar
  55. 55.
    Xu, Y.: Alternating proximal gradient method for sparse nonnegative Tucker decomposition. Math. Program. Comput. 7(1), 39–70 (2015)MathSciNetzbMATHGoogle Scholar
  56. 56.
    Yang, L., Pong, T.K., Chen, X.: Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction. SIAM J. Imaging Sci. 10, 74–110 (2017)MathSciNetzbMATHGoogle Scholar
  57. 57.
    Yu, Y., Zhang, X., Schuurmans, D.: Generalized conditional gradient for sparse estimation. Preprint arXiv:1410.4828v1 (2014)
  58. 58.
    Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)MathSciNetzbMATHGoogle Scholar
  59. 59.
    Zhang, T.: Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11, 1081–1107 (2010)MathSciNetzbMATHGoogle Scholar
  60. 60.
    Zhang, T.: Multi-stage convex relaxation for feature selection. Bernoulli 19(5B), 2277–2293 (2013)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Research Institute for Interdisciplinary Sciences, School of Information Management and EngineeringShanghai University of Finance and EconomicsShanghaiChina
  2. 2.Department of Industrial Engineering and Operations ResearchUC BerkeleyBerkeleyUSA
  3. 3.Department of MathematicsUC DavisDavisUSA
  4. 4.Department of Industrial and Systems EngineeringUniversity of MinnesotaMinneapolisUSA
  5. 5.Institute of Data and Decision Analytics, The Chinese University of Hong Kong (Shenzhen), and Shenzhen Research Institute of Big DataShenzhenChina

Personalised recommendations