Skip to main content
Log in

A successive difference-of-convex approximation method for a class of nonconvex nonsmooth optimization problems

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

We consider a class of nonconvex nonsmooth optimization problems whose objective is the sum of a smooth function and a finite number of nonnegative proper closed possibly nonsmooth functions (whose proximal mappings are easy to compute), some of which are further composed with linear maps. This kind of problems arises naturally in various applications when different regularizers are introduced for inducing simultaneous structures in the solutions. Solving these problems, however, can be challenging because of the coupled nonsmooth functions: the corresponding proximal mapping can be hard to compute so that standard first-order methods such as the proximal gradient algorithm cannot be applied efficiently. In this paper, we propose a successive difference-of-convex approximation method for solving this kind of problems. In this algorithm, we approximate the nonsmooth functions by their Moreau envelopes in each iteration. Making use of the simple observation that Moreau envelopes of nonnegative proper closed functions are continuous difference-of-convex functions, we can then approximately minimize the approximation function by first-order methods with suitable majorization techniques. These first-order methods can be implemented efficiently thanks to the fact that the proximal mapping of each nonsmooth function is easy to compute. Under suitable assumptions, we prove that the sequence generated by our method is bounded and any accumulation point is a stationary point of the objective. We also discuss how our method can be applied to concrete applications such as nonconvex fused regularized optimization problems and simultaneously structured matrix optimization problems, and illustrate the performance numerically for these two specific applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. These follow from (i) and [25, Corollary 8.10].

  2. These follow from (i) and [25, Corollary 8.10].

  3. To see this, recall from [15, Proposition 3.1] that an element \({\varvec{\zeta }}^*\) of \(\mathbf{proj}_{{\varOmega }}({\varvec{y}})\) can be obtained as

    $$\begin{aligned} \zeta ^*_i = {\left\{ \begin{array}{ll} {\tilde{\zeta }}^*_i &{} \mathrm{if}\ i \in I^*,\\ 0 &{} \mathrm{otherwise}, \end{array}\right. } \end{aligned}$$

    where \({\tilde{\zeta }}^*_i = {\hbox {argmin}}\{\frac{1}{2}(\zeta _i - y_i)^2:\; 0\le \zeta _i\le \tau \} = \max \{\min \{y_i,\tau \},0\}\), and \(I^*\) is an index set of size k corresponding to the k largest values of \(\{\frac{1}{2} y_i^2 - \frac{1}{2}({\tilde{\zeta }}^*_i - y_i)^2\}_{i=1}^n = \{\frac{1}{2} y_i^2 - \frac{1}{2}(\min \{\max \{y_i - \tau ,0\},y_i\})^2\}_{i=1}^n\). Since the function \(t\mapsto \frac{1}{2}t^2 - \frac{1}{2}(\min \{\max \{t - \tau ,0\},t\})^2\) is nondecreasing, we can let \(I^*\) correspond to any k largest entries of \({\varvec{y}}\).

  4. To see this, recall from [16, Corollary 2.3] and [15, Proposition 3.1] that an element \({\varvec{Y}} \in \mathbf{proj}_{{\tilde{{\varXi }}}_k}({\varvec{W}})\) can be computed as \({\varvec{Y}} = {\varvec{U}} \mathrm {Diag}({\varvec{\zeta }}^*){\varvec{V}}^\top \), where

    $$\begin{aligned} \zeta ^*_i = {\left\{ \begin{array}{ll} {\tilde{\zeta }}^*_i &{} \mathrm{if}\ i \in I^*,\\ 0 &{} \mathrm{otherwise}, \end{array}\right. } \end{aligned}$$

    where \({\tilde{\zeta }}^*_i = {\hbox {argmin}}\{\frac{1}{2}(\zeta _i - \sigma _i)^2:\; |\zeta _i|\le \tau \} = \min \{\sigma _i,\tau \}\), and \(I^*\) is an index set of size k corresponding to the k largest values of \(\{\frac{1}{2}\sigma _i^2 - \frac{1}{2}({\tilde{\zeta }}^*_i - \sigma _i)^2\}_{i=1}^n = \{\frac{1}{2}\sigma _i^2 - \frac{1}{2}(\max \{0,\sigma _i - \tau \})^2\}_{i=1}^n\). Since \(t\mapsto \frac{1}{2}t^2 - \frac{1}{2}(\max \{0,t - \tau \})^2\) is nondecreasing for nonnegative t, we can take \(I^*\) to correspond to any k largest singular values.

  5. To see this, recall from [16, Proposition 2.8] and [15, Proposition 3.1] that an element \({\varvec{Y}} \in \mathbf{proj}_{{\tilde{{\varPi }}}_k}({\varvec{W}})\) can be computed as \({\varvec{Y}} = {\varvec{U}} \mathrm {Diag}({\varvec{\zeta }}^*){\varvec{V}}^\top \), where

    $$\begin{aligned} \zeta ^*_i = {\left\{ \begin{array}{ll} {\tilde{\zeta }}^*_i &{} \mathrm{if}\ i \in I^*,\\ 0 &{} \mathrm{otherwise}, \end{array}\right. } \end{aligned}$$

    where \({\tilde{\zeta }}^*_i = {\hbox {argmin}}\{\frac{1}{2}(\zeta _i - \lambda _i)^2:\; 0\le \zeta _i\le \tau \} = \max \{\min \{\lambda _i,\tau \},0\}\), and \(I^*\) is an index set of size k corresponding to the k largest values of \(\{\frac{1}{2}\lambda _i^2 - \frac{1}{2}({\tilde{\zeta }}^*_i - \lambda _i)^2\}_{i=1}^n = \{\frac{1}{2}\lambda _i^2 - \frac{1}{2}(\min \{\max \{\lambda _i - \tau ,0\},\lambda _i\})^2\}_{i=1}^n\). Since the function \(t\mapsto \frac{1}{2}t^2 - \frac{1}{2}(\min \{\max \{t - \tau ,0\},t\})^2\) is nondecreasing, we can let \(I^*\) correspond to any k largest entries of \({\varvec{\lambda }}\).

  6. This refers to the total number of inner iterations.

  7. We would like to point out that we are indeed using \({\varXi }_k\) in place of \({\tilde{{\varXi }}}_k\) in (40) and using S in place of \({\tilde{S}}\) in (41) in our experiments below. Notice that A3 is still satisfied because f is level-bounded.

  8. This refers to the total number of inner iterations.

References

  1. Ahn, M., Pang, J.S., Xin, J.: Difference-of-convex learning: directional stationarity, optimality, and sparsity. SIAM J. Optim. 27, 1637–1665 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  2. Asplund, E.: Differentiability of the metric projection in finite dimensional Euclidean space. Proc. Am. Math. Soc. 38, 218–219 (1973)

    MathSciNet  MATH  Google Scholar 

  3. Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)

    Book  MATH  Google Scholar 

  4. Becker, S., Candès, E.J., Grant, M.: Templates for convex cone problems with applications to sparse signal recovery. Math. Progr. Comput. 3, 165–218 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  5. Borsdorf, R., Higham, N.J., Raydan, M.: Computing a nearest correlation matrix with factor structure. SIAM J. Matrix Anal. Appl. 31, 2603–2622 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  6. Brodie, J., Daubechies, I., De Mol, C., Giannone, D., Loris, I.: Sparse and stable Markowitz portfolios. Proc. Natl. Acad. Sci. 106, 12267–12272 (2009)

    Article  MATH  Google Scholar 

  7. Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9, 717–772 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chen, C., Li, X., Tolman, C., Wang, S., Ye, Y.: Sparse portfolio selection via quasi-norm regularization. arXiv:1312.6350, (2013)

  9. Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Progr. 55, 293–318 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  10. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximations. Comput. Math. Appl. 2, 17–40 (1976)

    Article  MATH  Google Scholar 

  11. Gao, Y., Sun, D.: A majorized penalty approach for calibrating rank constrained correlation matrix problems, Technical report, National University of Singapore (2010)

  12. Gong, P., Zhang, C., Lu, Z., Huang, J., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: Proceedings of the 30th International Conference on Machine Learning, 37–45 (2013)

  13. Li, G., Pong, T.K.: Global convergence of splitting methods for nonconvex composite optimization. SIAM J. Optim. 25, 2434–2460 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  14. Lu, Z., Li, X.: Sparse recovery via partial regularization: models, theory and algorithms. arXiv:1511.07293 (2015)

  15. Lu, Z., Zhang, Y.: Sparse approximation via penalty decomposition methods. SIAM J. Optim. 23, 2448–2478 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  16. Lu, Z., Zhang, Y., Li, X.: Penalty decomposition methods for rank minimization. Optim. Methods Softw. 30, 531–558 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  17. Lucet, Y.: Fast Moreau envelope computation I: numerical algorithms. Numer. Algorithms 43, 235–249 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  18. Markovsky, I.: Structured low-rank approximation and its applications. Automatica 44, 891–909 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  19. Markowitz, H.: Portfolio selection. J. Financ. 7, 77–91 (1952)

    Google Scholar 

  20. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Progr. 103, 127–152 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  21. Parekh, A., Selesnick, I.W.: Convex fused Lasso denoising with non-convex regularization and its use for pulse detection. In: Proceedings of IEEE Signal Processing in Medicine and Biology Symposium, 1–6 (2015)

  22. Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions for linear matrix equations via nuclear norm minimization. SIAM Rev 52, 471–501 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  23. Richard, E., Savalle, P.-A., Vayatis, N.: Estimation of simultaneously sparse and low rank matrices. arXiv:1206.6474 (2012)

  24. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    Book  MATH  Google Scholar 

  25. Rockafellar, R.T., Wets, R.J.-B.: Variational Analysis. Springer, Berlin (1998)

    Book  MATH  Google Scholar 

  26. Slawski, M., Hein, M.: Non-negative least squares for high-dimensional linear models: consistency and sparse recovery without regularization. Electron. J. Stat. 7, 3004–3056 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  27. Thiao, M., Pham, D.T., Le Thi, H.A.: A DC programming approach for sparse eigenvalue problem. In: Proceedings of the 27th International Conference on Machine Learning, 1063–1070 (2010)

  28. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  29. Tibshirani, R., Taylor, J.: The solution path of the generalized Lasso. Ann. Stat. 39, 1335–1371 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  30. Tono, K., Takeda, A., Gotoh, J.: Efficient DC algorithm for constrained sparse optimization. arXiv:1701.08498 (2017)

  31. Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57, 2479–2493 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  32. Yu, Y.L.: Better approximation and faster algorithm using the proximal average. Adv. Neural Inf. Process. Syst. 26, 458–466 (2013)

    Google Scholar 

  33. Yu, Y.L., Zheng, X., Marchetti-Bowick, M., Xing, E.: Minimizing nonconvex non-separable functions. In: Proceedings of the 18th International Conference on Artificial Intelligence and Statistics 38, 1107–1115 (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Akiko Takeda.

Additional information

Ting Kei Pong is supported in part by Hong Kong Research Grants Council PolyU153085/16p. Akiko Takeda is supported by Grant-in-Aid for Scientific Research (C), 15K00031.

A Convergence of an NPG method with majorization

A Convergence of an NPG method with majorization

In this appendix, we consider the following optimization problem:

$$\begin{aligned} \min \limits _{{\varvec{x}}} ~~ F({\varvec{x}}) = h({\varvec{x}}) + P({\varvec{x}}) - g({\varvec{x}}), \end{aligned}$$
(44)

where h is an \(L_h\)-smooth function, P is a proper closed function with \(\inf P > -\infty \) and g is a continuous convex function. We assume in addition that there exists \({\varvec{x}}^0\in \mathrm{dom}\,P\) so that F is continuous in \({\varOmega }({\varvec{x}}^0):= \{{\varvec{x}}:\; F({\varvec{x}})\le F({\varvec{x}}^0)\}\) and the set \({\varOmega }({\varvec{x}}^0)\) is compact. As a consequence, it holds that \(\inf F > -\infty \).

In Algorithm 2 below, we describe an algorithm, the nonmonotone proximal gradient method with majorization (\(\hbox {NPG}_{\mathrm{major}}\)), for solving (44). We first show that the line-search criterion is well-defined.

figure c

Proposition 1

For each t, the condition (46) is satisfied after at most

$$\begin{aligned} {\tilde{n}} := \max \left\{ \left\lceil \frac{\log (L_h + c) - \log (L_{\min })}{\log \tau }\right\rceil ,1\right\} \end{aligned}$$

inner iterations, which is independent of t. Consequently, \(\{{\bar{L}}_t\}\) is bounded.

Proof

For each t and \(L > 0\), let \({\varvec{u}}^t_L\) be an arbitrarily fixed element in

$$\begin{aligned} \mathop {\hbox {Argmin}}\limits _{{\varvec{x}}} \left\{ (\nabla h({\varvec{x}}^t) - {\varvec{\zeta }}^t)^\top ({\varvec{x}}-{\varvec{x}}^t) + \frac{L}{2}\Vert {\varvec{x}}-{\varvec{x}}^t\Vert ^2 + P({\varvec{x}})\right\} . \end{aligned}$$

Then we have

$$\begin{aligned} \begin{aligned} F({\varvec{u}}^t_L)&\le h({\varvec{x}}^t) + \nabla h({\varvec{x}}^t)^\top ({\varvec{u}}^t_L - {\varvec{x}}^t) + \frac{L_h}{2}\Vert {\varvec{u}}^t_L - {\varvec{x}}^t\Vert ^2 + P({\varvec{u}}^t_L) - g({\varvec{x}}^t) \\&\qquad - {{\varvec{\zeta }}^t}^\top ({\varvec{u}}^t_L - {\varvec{x}}^t)\\&= h({\varvec{x}}^t) - g({\varvec{x}}^t) + (\nabla h({\varvec{x}}^t)- {{\varvec{\zeta }}^t})^\top ({\varvec{u}}^t_L - {\varvec{x}}^t) + \frac{L_h}{2}\Vert {\varvec{u}}^t_L - {\varvec{x}}^t\Vert ^2 + P({\varvec{u}}^t_L)\\&\le F({\varvec{x}}^t) + \frac{L_h-L}{2}\Vert {\varvec{u}}^t_L - {\varvec{x}}^t\Vert ^2, \end{aligned} \end{aligned}$$

where the first inequality holds because of the \(L_h\)-smoothness of h, the convexity of g and the fact that \({\varvec{\zeta }}^t\in \partial g({\varvec{x}}^t)\), and the last inequality follows from the definition of \({\varvec{u}}^t_L\) as a minimizer. Thus, at the t-th iteration, the criterion (46) is satisfied by \({\varvec{u}} = {\varvec{u}}^t_L\) whenever \(L \ge L_h + c\). Since we have

$$\begin{aligned} \tau ^{{\tilde{n}}} L^0_t \ge \tau ^{{\tilde{n}}} L_{\min } \ge L_h+c, \end{aligned}$$

we conclude that (46) must be satisfied at or before the \({\tilde{n}}\)-th inner iteration. Consequently, we have \({\bar{L}}_t \le \tau ^{{\tilde{n}}}L_{\max }\) for all t. \(\square \)

The convergence of \(\hbox {NPG}_{\mathrm{major}}\) can now be proved similarly as in [31, Lemma 4].

Proposition 2

Let \(\{{\varvec{x}}^t\}\) be the sequence generated by \(\hbox {NPG}_{\mathrm{major}}\). Then \(\Vert {\varvec{x}}^{t+1} - {\varvec{x}}^t\Vert \rightarrow 0\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, T., Pong, T.K. & Takeda, A. A successive difference-of-convex approximation method for a class of nonconvex nonsmooth optimization problems. Math. Program. 176, 339–367 (2019). https://doi.org/10.1007/s10107-018-1327-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-018-1327-8

Keywords

Mathematics Subject Classification

Navigation