Advertisement

Mathematical Programming

, Volume 169, Issue 1, pp 141–176 | Cite as

DC formulations and algorithms for sparse optimization problems

  • Jun-ya Gotoh
  • Akiko Takeda
  • Katsuya Tono
Full Length Paper Series B

Abstract

We propose a DC (Difference of two Convex functions) formulation approach for sparse optimization problems having a cardinality or rank constraint. With the largest-k norm, an exact DC representation of the cardinality constraint is provided. We then transform the cardinality-constrained problem into a penalty function form and derive exact penalty parameter values for some optimization problems, especially for quadratic minimization problems which often appear in practice. A DC Algorithm (DCA) is presented, where the dual step at each iteration can be efficiently carried out due to the accessible subgradient of the largest-k norm. Furthermore, we can solve each DCA subproblem in linear time via a soft thresholding operation if there are no additional constraints. The framework is extended to the rank-constrained problem as well as the cardinality- and the rank-minimization problems. Numerical experiments demonstrate the efficiency of the proposed DCA in comparison with existing methods which have other penalty terms.

Keywords

Sparse optimization Cardinality constraint Rank constraint DCA Largest-k norm Ky Fan k norm Proximal operation 

Mathematics Subject Classification

47A30 90C20 90C26 90C90 

Notes

Acknowledgements

The research of the first author was supported by JSPS KAKENHI Grant Number 15K01204, 22510138, and 26242027. The research of the second author was supported by JST CREST Grant Number JPMJCR15K5, Japan. The authors are very grateful for the reviewers, whose comments enabled us to improve the readability of the paper.

References

  1. 1.
    Alizadeh, F.: Interior point methods in semidefinite programming with applications to combinatorial optimization. SIAM J. Optim. 5(1), 13–51 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Arthanari, T.S., Dodge, Y.: Mathematical Programming in Statistics, vol. 341. Wiley, New York (1981)zbMATHGoogle Scholar
  3. 3.
    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)CrossRefzbMATHGoogle Scholar
  4. 4.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Bertsimas, D., Pachamanova, D., Sim, M.: Robust linear optimization under general norms. Oper. Res. Lett. 32(6), 510–516 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Bertsimas, D., King, A., Mazumder, R.: Best subset selection via a modern optimization lens. Ann. Stat. 44(2), 813–852 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Blake, C.L., Merz, C.J.: UCI repository of machine learning databases, (1998). http://www.ics.uci.edu/~mlearn/MLRepository.html
  8. 8.
    Blum, M., Floyd, R.W., Pratt, V., Rivest, R.L., Tarjan, R.E.: Time bounds for selection. J. Comput. Syst. Sci. 7, 448–461 (1973)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: International Conference on Machine Learning, Volume 98, pp. 82–90. (1998)Google Scholar
  10. 10.
    Brodie, J., Daubechies, I., De Mol, C., Giannone, D., Loris, I.: Sparse and stable markowitz portfolios. Proc. Natl. Acad. Sci. 106(30), 12267–12272 (2009)CrossRefzbMATHGoogle Scholar
  11. 11.
    Bruckstein, A.M., Donoho, D.L., Elad, M.: From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Rev. 51(1), 34–81 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Cai, J., Candes, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20(4), 1956–1982 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Cai, X., Nie, F., Huang, H.: Exact top-\(k\) feature selection via \(\ell _{2,0}\)-norm constraint. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (2013)Google Scholar
  14. 14.
    Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Candès, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Donoho, D.L.: De-noising by soft thresholding. IEEE Trans. Inf. Theory 41(3), 613–627 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Gong, P., Zhang, C., Lu, Z., Huang, J., Ye, J.: A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In: Proceedings of International Conference on Machine Learning Volume 28, pp. 37–45. (2013)Google Scholar
  20. 20.
    Gotoh, J., Uryasev, S.: Two pairs of families of polyhedral norms versus \(\ell _p\)-norms: proximity and applications in optimization. Math. Program. 156(1), 391–431 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Gulpinar, N., Le Thi, H.A., Moeini, M.: Robust investment strategies with discrete asset choice constraints using DC programming. Optimization 59(1), 45–62 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Hempel, A.B., Goulart, P.J.: A novel method for modelling cardinality and rank constraints. In: IEEE Conference on Decision and Control, pp. 4322–4327. Los Angeles, USA, December (2014). http://control.ee.ethz.ch/index.cgi?page=publications;action=details;id=4712
  23. 23.
    Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches, 3rd edn. Springer-Verlag, Berlin (1996)CrossRefzbMATHGoogle Scholar
  24. 24.
    Hu, Y., Zhang, D., Ye, J., Li, X., He, X.: Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Trans. Pattern Anal. Mach. Intell. 35(9), 2117–2130 (2013)CrossRefGoogle Scholar
  25. 25.
    Le Thi, H.A., Pham Dinh, T.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133(1–4), 23–46 (2005)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Le Thi, H.A., Pham Dinh, T.: DC approximation approaches for sparse optimization. Eur. J. Oper. Res. 244(1), 26–46 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    Le Thi, H.A., Pham Dinh, T., Muu, L.D.: Exact penalty in D.C. programming. Vietnam J. Math. 27(2), 169–178 (1999)MathSciNetzbMATHGoogle Scholar
  28. 28.
    Le Thi, H.A., Le, H.M., Nguyen, V.V., Pham Dinh, T.: A DC programming approach for feature selection in support vector machines learning. Adv. Data Anal. Classif. 2(3), 259–278 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Le Thi, H.A., Pham Dinh, T., Yen, N.D.J.: Properties of two DC algorithms in quadratic programming. J. Glob. Optim. 49(3), 481–495 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Le Thi, H.A., Pham Dinh, T., Ngai, H.V.J.: Exact penalty and error bounds in DC programming. J. Glob. Optim. 52(3), 509–535 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    Lu, C., Tang, J., Yan, S., Lin, Z.: Generalized nonconvex nonsmooth low-rank minimization. In: Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pp. 4130–4137. IEEE, (2014)Google Scholar
  32. 32.
    Ma, S., Goldfarb, D., Chen, L.: Fixed point and bregman iterative methods for matrix rank minimization. Math. Program. 128(1), 321–353 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Miyashiro, R., Takano, Y.: Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur. J. Oper. Res. 247(3), 721–731 (2015a)Google Scholar
  34. 34.
    Miyashiro, R., Takano, Y.: Subset selection by Mallow’s \(C_p\): a mixed integer programming approach. Expert Syst. Appl. 42(1), 325–331 (2015b)CrossRefGoogle Scholar
  35. 35.
    Moghaddam, B., Weiss, Y., Avidan, S.: Generalized spectral bounds for sparse lda. In: Proceedings of the 23rd International Conference on Machine learning, pp. 641–648. ACM, (2006)Google Scholar
  36. 36.
    Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Nesterov, Y.: Gradient methods for minimizing composite objective function. Math. Program. 140(1), 125–161 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Nguyen, T.B.T., Le Thi, H.A., Le, H.M., Vo, X.T.: DC approximation approach for \(\ell _0\)-minimization in compressed sensing. In: Le Thi, H.A., Nguyen, N.T., Do, T.V. (eds.) Advanced Computational Methods for Knowledge Engineering, pp. 37–48. Springer, Berlin (2015)Google Scholar
  39. 39.
    Nhat, P.D., Nguyen, M.C., Le Thi, H.A.: A DC programming approach for sparse linear discriminant analysis. In: Do, T.V., Le Thi, H.A., Nguyen, N.T. (eds.) Advanced Computational Methods for Knowledge Engineering, pp. 65–74. Springer, (2014)Google Scholar
  40. 40.
    Nocedal, Jorge, Wright, Stephen J.: Numerical Optimization 2nd. Springer, Berlin (2006)zbMATHGoogle Scholar
  41. 41.
    Overton, M.L., Womersley, R.S.: Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices. Math. Program. 62(1–3), 321–357 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Pavlikov, K., Uryasev, S.: CVaR norm and applications in optimization. Optim. Lett. 8(7), 1999–2020 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Pham Dinh, T., Le Thi, H.A.: Convex analysis approach to D.C. programming: theory, algorithms and applications. Acta Math. Vietnam. 22(1), 289–355 (1997)MathSciNetzbMATHGoogle Scholar
  44. 44.
    Pham Dinh, T., Le Thi, H.A.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Optim. 8(2), 476–505 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  45. 45.
    Pham Dinh, T., Le Thi, H.A.: Recent advances in DC programming and DCA. Trans. Comput. Collect. Intell. 8342, 1–37 (2014)Google Scholar
  46. 46.
    Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  47. 47.
    Shevade, S.K., Keerthi, S.S.: A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19(17), 2246–2253 (2003)CrossRefGoogle Scholar
  48. 48.
    Smola, A.J., Vishwanathan, S.V.N., Hofmann, T.: Kernel methods for missing variables. In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, pp. 325–332. (2005)Google Scholar
  49. 49.
    Sriperumbudur, B.K., Lanckriet, G.R.G.: A proof of convergence of the concave-convex procedure using Zangwill’s theory. Neural Comput. 24(6), 1391–1407 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  50. 50.
    Takeda, A., Niranjan, M., Gotoh, J., Kawahara, Y.: Simultaneous pursuit of out-of-sample performance and sparsity in index tracking portfolios. Comput. Manag. Sci. 10(1), 21–49 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  51. 51.
    Thiao, M., Pham Dinh, T., Le Thi, H.A.: A DC programming approach for sparse eigenvalue problem. In: Proceedings of the 27th International Conference on Machine Learning, pp. 1063–1070. (2010)Google Scholar
  52. 52.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  53. 53.
    Toh, K.-C., Yun, S.: An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems. Pac. J. Optim. 6(3), 615–640 (2010)MathSciNetzbMATHGoogle Scholar
  54. 54.
    Watson, G.A.: Linear best approximation using a class of polyhedral norms. Numer. Algorithms 2(3), 321–335 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  55. 55.
    Watson, G.A.: On matrix approximation problems with Ky Fan \(k\) norms. Numer. Algorithms 5(5), 263–272 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  56. 56.
    Wu, B., Ding, C., Sun, D.F., Toh, K.-C.: On the Moreau-Yosida regularization of the vector \(k\)-norm related functions. SIAM J. Optim. 24, 766–794 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  57. 57.
    Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  58. 58.
    Zheng, X., Sun, X., Li, D., Sun, J.: Successive convex approximations to cardinality-constrained convex programs: a piecewise-linear DC approach. Comput. Optim. Appl. 59(1–2), 379–397 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  59. 59.
    Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. stat. 15(2), 265–286 (2006)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany and Mathematical Optimization Society 2017

Authors and Affiliations

  1. 1.Department of Industrial and Systems EngineeringChuo UniversityTokyoJapan
  2. 2.Department of Mathematical Analysis and Statistical InferenceThe Institute of Statistical MathematicsTokyoJapan
  3. 3.RIKEN Center for Advanced Intelligence ProjectTokyoJapan
  4. 4.Data Science Research LaboratoriesNEC CorporationKanagawaJapan

Personalised recommendations