Accelerating block coordinate descent methods with identification strategies

  • R. Lopes
  • S. A. SantosEmail author
  • P. J. S. Silva


This work is about active set identification strategies aimed at accelerating block-coordinate descent methods (BCDM) applied to large-scale problems. We start by devising an identification function tailored for bound-constrained composite minimization together with an associated version of the BCDM, called Active BCDM, that is also globally convergent. The identification function gives rise to an efficient practical strategy for Lasso and \(\ell _1\)-regularized logistic regression. The computational performance of Active BCDM is contextualized using comparative sets of experiments that are based on the solution of problems with data from deterministic instances from the literature. These results have been compared with those of well-established and state-of-the-art methods that are particularly suited for the classes of applications under consideration. Active BCDM has proved useful in achieving fast results due to its identification strategy. Besides that, an extra second-order step was used, with favorable cost-benefit.


Block coordinate descent Active-set identification Large-scale optimization \(\ell _1\) Regularization 

Mathematics Subject Classification

60K05 49M37 90C30 90C06 90C25 



We are thankful to the comments and suggestions of two anonymous referees, which helped us to improve the presentation of our work.


  1. 1.
    Andrew, G., Gao, J.: Scalable training of L1-regularized log-linear models. In: Proceedings of the 24th international conference on machine learning, ICML ’07, pp. 33–40. ACM, New York, NY, USA (2007).
  2. 2.
    Beck, A., Tetruashvili, L.: On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4), 2037–2060 (2013). MathSciNetzbMATHGoogle Scholar
  3. 3.
    Berg, E.V., Friedlander, M.P., Hennenfent, G., Herrmann, F., Saab, R., Yılmaz, Ö.: SPARCO: a testing framework for sparse reconstruction. Technical Report TR-2007-20, Department of Computer Science, University of British Columbia, Vancouver (2007)Google Scholar
  4. 4.
    Boisvert, R.F., Pozo, R., Remington, K., Barrett, R.F., Dongarra, J.J.: Matrix market: a web resource for test matrix collections, pp. 125–137. Springer US, Boston, MA (1997).
  5. 5.
    Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2010)zbMATHGoogle Scholar
  6. 6.
    Bradley, J.K., Kyrola, A., Bickson, D., Guestrin, C.: Parallel coordinate descent for \(\ell _1\)-regularized loss minimization. In: ICML2011 (ed.) Proceedings of the 28th international conference on machine learning, pp. 1–8. The International Machine Learning Society, Bellevue, Washington, USA (2011).
  7. 7.
    Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009). MathSciNetzbMATHGoogle Scholar
  8. 8.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)Google Scholar
  9. 9.
    Chen, T., Curtis, F., Robinson, D.: A reduced-space algorithm for minimizing \(\ell _1\)-regularized convex functions. SIAM J. Optim. 27(3), 1583–1610 (2017). MathSciNetzbMATHGoogle Scholar
  10. 10.
    Csiba, D., Qu, Z., Richtárik, P.: Stochastic dual coordinate ascent with adaptive probabilities. In: Proceedings of the 32nd international conference on international conference on machine learning, ICML’15, vol. 37, pp. 674–683., Lille, France (2015).
  11. 11.
    Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011). MathSciNetzbMATHGoogle Scholar
  12. 12.
    De Santis, M., Lucidi, S., Rinaldi, F.: A fast active set block coordinate descent algorithm for \(\ell _1\)-regularized least squares. SIAM J. Optim. 26(1), 781–809 (2016). MathSciNetzbMATHGoogle Scholar
  13. 13.
    Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002). MathSciNetzbMATHGoogle Scholar
  14. 14.
    Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006). MathSciNetzbMATHGoogle Scholar
  15. 15.
    Facchinei, F., Fischer, A., Kanzow, C.: On the accurate identification of active constraints. SIAM J. Optim. 9(1), 14–32 (1998). MathSciNetzbMATHGoogle Scholar
  16. 16.
    Fercoq, O., Richtárik, P.: Accelerated, parallel and proximal coordinate descent. SIAM J. Optim. 25(4), 1997–2013 (2015). MathSciNetzbMATHGoogle Scholar
  17. 17.
    Fountoulakis, K., Tappenden, R.: A flexible coordinate descent method. Comput. Optim. Appl. 70(2), 351–394 (2018). MathSciNetzbMATHGoogle Scholar
  18. 18.
    Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010). Google Scholar
  19. 19.
    Glasmachers, T., Dogan, U.: Accelerated coordinate descent with adaptive coordinate frequencies. In: Proceedings of the 5th Asian conference on machine learning (ACML), Proc. Mach. Learn. Res., vol. 29, pp. 72–86. PMLR, Australian National University, Canberra, Australia (2013).
  20. 20.
    Kim, D., Sra, S., Dhillon, I.S.: A non-monotonic method for large-scale non-negative least squares. Optim. Methods Softw. 28(5), 1012–1039 (2013). MathSciNetzbMATHGoogle Scholar
  21. 21.
    Kim, S.J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale \(\ell _1\)-regularized least squares. IEEE J. Sel. Top. Signal Process. 1(4), 606–617 (2007)Google Scholar
  22. 22.
    Komarek, P.: Paul Komarek’s webpage. Accessed 29 January 2017
  23. 23.
    Lichman, M.: UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences. Last updated 23 July 2017. Accessed 01 September 2017
  24. 24.
    Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. The MIT Press, Cambridge (2012)zbMATHGoogle Scholar
  25. 25.
    Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012). MathSciNetzbMATHGoogle Scholar
  26. 26.
    Ng, A.Y.: Feature selection, \({L}_1\) vs. \({L}_2\) regularization and rotational invariance. In: Proceedings of the 21st international conference on machine learning, p. 354 (2004).
  27. 27.
    Patrascu, A., Necoara, I.: Efficient random coordinate descent algorithms for large-scale structured nonconvex optimization. J. Glob. Optim. 61(1), 19–46 (2015). MathSciNetzbMATHGoogle Scholar
  28. 28.
    Qu, Z., Richtárik, P.: Coordinate descent with arbitrary sampling I: algorithms and complexity. Optim. Methods Softw. 31(5), 829–857 (2016). MathSciNetzbMATHGoogle Scholar
  29. 29.
    Richtárik, P., Takáč, M.: Efficient serial and parallel coordinate descent methods for huge-scale truss topology design, pp. 27–32. Springer, Berlin, Heidelberg (2012).
  30. 30.
    Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1), 1–38 (2014). MathSciNetzbMATHGoogle Scholar
  31. 31.
    Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Math. Program. 156(1), 433–484 (2016). MathSciNetzbMATHGoogle Scholar
  32. 32.
    Schmidt, M.: Graphical model structure learning with l1-regularization. Ph.D. thesis, University of British Columbia, Vancouver (2010)Google Scholar
  33. 33.
    Slawski, M.: Problem-specific analysis of non-negative least squares solvers with a focus on instances with sparse solutions (working paper) (2013). Accessed 01 September 2017
  34. 34.
    Tappenden, R., Richtárik, P., Gondzio, J.: Inexact coordinate descent: complexity and preconditioning. J. Optim. Theory Appl. 170(1), 144–176 (2016). MathSciNetzbMATHGoogle Scholar
  35. 35.
    Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009). MathSciNetzbMATHGoogle Scholar
  37. 37.
    Wen, Z., Yin, W., Zhang, H., Goldfarb, D.: On the convergence of an active-set method for \(\ell _1\) minimization. Optim. Methods Softw. 27(6), 1127–1146 (2012). MathSciNetzbMATHGoogle Scholar
  38. 38.
    Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015). MathSciNetzbMATHGoogle Scholar
  39. 39.
    Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57(7), 2479–2493 (2009). MathSciNetzbMATHGoogle Scholar
  40. 40.
    Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68(1), 49–67 (2006). MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Institute of MathematicsUniversity of CampinasCampinasBrazil

Personalised recommendations