Optimality conditions for locally Lipschitz optimization with \(l_0\)-regularization


This paper mainly investigates the locally Lipschitz optimization problem (LLOP) with \(l_0\)-regularization in a finite dimensional space, which is generally NP-hard but highly applicable in statistics, compressed sensing and deep learning. First, we introduce two classes of stationary points for this problem: subdifferential-stationary point and proximal-stationary point. Secondly, based on these two concepts, we analyze the first-order necessary/sufficient optimality conditions for the LLOP with \(l_0\)-regularization. Finally, we present two examples to illustrate the validity of the proposed optimality conditions.

This is a preview of subscription content, access via your institution.


  1. 1.

    Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 699–707 (2016)

  2. 2.

    Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)

    Google Scholar 

  3. 3.

    Beck, A., Hallak, N.: Proximal mapping for symmetric penalty and sparsity. SIAM J. Optim. 28(1), 496–527 (2018)

    MathSciNet  Article  Google Scholar 

  4. 4.

    Bian, W., Chen, X.J.: A Smoothing proximal gradient algorithm for nonsmooth convex regression with cardinality penalty. SIAM J. Numer. Anal. 58(1), 858–883 (2020)

    MathSciNet  Article  Google Scholar 

  5. 5.

    Blumensath, T.: Compressed sensing with nonlinear observations and related nonlinear optimization problems. IEEE Trans. Inf. Theory 59(6), 3466–3474 (2013)

    MathSciNet  Article  Google Scholar 

  6. 6.

    Blumensath, T., Davies, M.E.: Iterative thresholding for sparse approximations. J. Fourier Anal. Appli. 14(5–6), 629–654 (2008)

    MathSciNet  Article  Google Scholar 

  7. 7.

    Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmonic Anal. 27(3), 265–274 (2009)

    MathSciNet  Article  Google Scholar 

  8. 8.

    Chen, Y.Q., Xiu, N.H., Peng, D.T.: Global solutions of non-Lipschitz \(S_{2}CS_{p}\) minimization over the positive semidefinite cone. Optim. Lett. 8(7), 2053–2064 (2013)

    Article  Google Scholar 

  9. 9.

    Chen, X.J., Pan, L.L., Xiu, N.H.: Relationship between three sparse optimization problems for multivariate regression. Submitted 1–32 (2019)

  10. 10.

    Chib, S.: Bayes inference in the Tobit censored regression model. J. Econom. 51(1–2), 79–99 (1992)

    MathSciNet  Article  Google Scholar 

  11. 11.

    Clarke, F.H.: Optimization and Nonsmooth Analysis. Wiley, Hoboken (1983)

    Google Scholar 

  12. 12.

    Clarke, F.H.: Methods of Dynamic and Nonsmooth Optimization, CBMS-NSF Regional Conference Series in Applied Mathmatics, vol. 57. SIAM Publications, Philadelphia (1989)

    Google Scholar 

  13. 13.

    Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(42), 4203–4215 (2005)

    MathSciNet  Article  Google Scholar 

  14. 14.

    Chen, X.J., Ge, D.D., Wang, Z.Z., et al.: Complexity of unconstrained \(L_2-L_p\) minimization. Math. Program. 143(1–2), 371–383 (2014)

    MathSciNet  Article  Google Scholar 

  15. 15.

    Cuim, Y., Pangm, J.S., Senm, B.: Composite difference-max programs for modern statistical estimation problems. SIAM J. Optim. 28(4), 3344–3374 (2018)

    MathSciNet  Article  Google Scholar 

  16. 16.

    Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

    MathSciNet  Article  Google Scholar 

  17. 17.

    Guo, L., Ye, J.J.: Necessary optimality conditions and exact penalization for non-Lipschitz nonlinear programs. Math. Program. 168(1–2), 571–598 (2018)

    MathSciNet  Article  Google Scholar 

  18. 18.

    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. Artif. Intell. Stat. 15, 315–323 (2011)

    Google Scholar 

  19. 19.

    Hinton, G.E.: Rectified linear units improve restricted boltzmann machines Vinod Nair. In: International Conference on International Conference on Machine Learning. Omnipress (2010)

  20. 20.

    Hossein, R., Ajmal, M., Mubarak, S.: Learning a deep model for human action recognition from novel viewpoints. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 667–681 (2017)

    Google Scholar 

  21. 21.

    Cho, K., Van Merrienboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734 (2014)

  22. 22.

    Le, H.Y.: Generalized subdifferentials of the rank function. Optim. Lett. 7(4), 731–743 (2013)

    MathSciNet  Article  Google Scholar 

  23. 23.

    Liu, J., Cosman, P.C., Rao, B.D.: Robust linear regression via \(l_0\) regularization. IEEE Trans. Signal Process. 66(3), 698–713 (2017)

    Article  Google Scholar 

  24. 24.

    Lu, Z.S., Zhang, Y.: Sparse approximation via penalty decomposition methods. SIAM J. Optim. 23(4), 2448–2478 (2013)

    MathSciNet  Article  Google Scholar 

  25. 25.

    Lu, Z.S.: Iterative reweighted minimization methods for \(l_p\)-regularized unconstrained nonlinear programming. Math. Program. 147(1–2), 277–307 (2014)

    MathSciNet  Article  Google Scholar 

  26. 26.

    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)

    Article  Google Scholar 

  27. 27.

    Mordukhovich, B.S.: Variational Analysis and Application. Springer, Berlin (2018)

    Google Scholar 

  28. 28.

    Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)

    MathSciNet  Article  Google Scholar 

  29. 29.

    Nikolova, M.: Relationship between the optimal solutions of least squares regularized with \(l_0\)-norm and constrained by k-sparsity. Appl. Comput. Harmonic Anal. 41(1), 237–265 (2016)

    MathSciNet  Article  Google Scholar 

  30. 30.

    Powell, J.L.: Least absolute deviations estimation for the censored regression model. J. Econom. 25(3), 303–325 (1984)

    MathSciNet  Article  Google Scholar 

  31. 31.

    Rockafellar, R.T., Wets, R.J.: Variational Analysis. Springer, Berlin (1998)

    Google Scholar 

  32. 32.

    Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    Google Scholar 

  33. 33.

    Thorarinsdottir, T.L., Gneiting, T.: Probabilistic forecasts of wind speed: ensemble model output statistics by using heteroscedastic censored regression. J. R. Stat. Soc. Ser. A (Stat. Soc.) 173(2), 371–388 (2010)

    Article  Google Scholar 

  34. 34.

    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, Burlington (2000)

    Google Scholar 

  35. 35.

    Yu, D., Deng, L.: Automatic Speech Recognition: A Deep Learning Approach, Signals and Communications Technology. Springer, Berlin (2015)

    Google Scholar 

  36. 36.

    Yuan, X.T., Liu, Q.S.: Newton greedy pursuit: a quadratic approximation method for sparsity-constrained optimization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4122–4129 (2014)

  37. 37.

    Yuan, X.T., Liu, Q.S.: Newton-type greedy selection methods for \(l_0\)-constrained minimization. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2437–2450 (2017)

    Article  Google Scholar 

  38. 38.

    Wang, R., Xiu, N., Zhang, C.: Greedy Projected Gradient-Newton Method for Sparse Logistic Regression. IEEE Transactions on Neural Networks and Learning Systems 31(2), 527–538 (2020)

    MathSciNet  Article  Google Scholar 

  39. 39.

    Zhou, S.L., Xiu, N.H., Qi, H.D.: Global and Quadratic Convergence of Newton Hard-Thresholding Pursuit. arXiv preprint arXiv:1901.02763 (2019)

  40. 40.

    Zhang, N., Li, Q.: On optimal solutions of the constrained \(l_0\) regularization and its penalty problem. Inverse Probl. 33(2), 025010 (2017)

    MathSciNet  Article  Google Scholar 

Download references


The authors would like to thank the associate editor and two anonymous referees for their constructive comments, which have significantly improved the quality of the paper. This work is supported by the National Natural Science Foundation of China (No. 11971052) and (No. 11801325).

Author information



Corresponding author

Correspondence to Hui Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Pan, L. & Xiu, N. Optimality conditions for locally Lipschitz optimization with \(l_0\)-regularization. Optim Lett 15, 189–203 (2021). https://doi.org/10.1007/s11590-020-01579-y

Download citation


  • Locally Lipschitz optimization
  • \(l_0\)-Regularization
  • Subdifferential-stationary point
  • Proximal-stationary point
  • Application