Skip to main content

Distributed Block Coordinate Descent for Minimizing Partially Separable Functions

  • Conference paper
Numerical Analysis and Optimization

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 134))

Abstract

A distributed randomized block coordinate descent method for minimizing a convex function of a huge number of variables is proposed. The complexity of the method is analyzed under the assumption that the smooth part of the objective function is partially block separable. The number of iterations required is bounded by a function of the error and the degree of separability, which extends the results in Richtárik and Takác (Parallel Coordinate Descent Methods for Big Data Optimization, Mathematical Programming, DOI:10.1007/s10107-015-0901-6) to a distributed environment. Several approaches to the distribution and synchronization of the computation across a cluster of multi-core computer are described and promising computational results are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that \(\xi \in \{\lceil \tfrac{\omega }{\mathit{C}}\rceil,\ldots,\omega \}\).

  2. 2.

    In fact, \(\vert \hat{\mathit{Z}}\vert = \mathit{C}\tau\) with probability 1.

  3. 3.

    For the start of the algorithm we define \(\delta g_{l}^{(\mathit{c})} =\delta G_{l}^{(\mathit{c})} = \mathbf{0}\) for all l < 0.

References

  1. Alham, N.K., Li, M., Liu, Y., Hammoud, S.: A MapReduce-based distributed SVM algorithm for automatic image annotation. Comput. Math. Appl. 62(7), 2801–2811 (2011)

    Article  MATH  Google Scholar 

  2. Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)

    MATH  Google Scholar 

  3. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall Inc., Upper Saddle River (1989)

    MATH  Google Scholar 

  4. Chang, E.Y., Zhu, K., Wang, H., Bai, H., Li, J., Qiu, Z., Cui, H.: PSVM: parallelizing support vector machines on distributed computers. Adv. Neural Inf. Process. Syst. 20, 1–18 (2007)

    Google Scholar 

  5. Fercoq, O., Richtárik, P.: Accelerated, parallel and proximal coordinate descent. arXiv:1312.5799 (2013)

    Google Scholar 

  6. Fercoq, O., Richtárik, P.: Smooth minimization of nonsmooth functions with parallel coordinate descent methods. arXiv:1309.5885 (2013)

    Google Scholar 

  7. Fercoq, O., Qu, Z., Richtárik, P., Takáč, M.: Fast distributed coordinate descent for non-strongly convex losses. In: IEEE Workshop on Machine Learning for Signal Processing (2014)

    Book  Google Scholar 

  8. Ge, D., Jiang, X., Ye, Y.: A note on the complexity of p minimization. Math. Program. 129(2), 285–299 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  9. Hsieh,C.-J., Chang, K.-W., Lin, C.-J., Sathiya Keerthi, S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pp. 408–415. ACM, New York (2008)

    Google Scholar 

  10. InfiniBand Trade Association: InfiniBand Architecture Specification, vol. 1, Release 1.0 (2005)

    Google Scholar 

  11. Jaggi, M., Smith, V., Takáč, M., Terhorst, J., Hofmann, T., Jordan, M.I.: Communication-efficient distributed dual coordinate ascent. In: Advances in Neural Information Processing Systems, vol. 27, 3068–3076 (NIPS 2014) http://papers.nips.cc/book/advances-in-neural-information-processing-systems-27-2014

  12. Lee, Y.T., Sidford, A.: Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. In: 54th Annual Symposium on Foundations of Computer Science. IEEE, New York (2013)

    Google Scholar 

  13. Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)

    Google Scholar 

  14. LIBSVM Data: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html. Accessed 25 Oct 2014

  15. Liu, J., Wright, S.J., Ré, C., Bittorf, V.: An asynchronous parallel stochastic coordinate descent algorithm. arXiv:1311.1873 (2013)

    Google Scholar 

  16. Lu, Z., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. arXiv:1305.4723 (2013)

    Google Scholar 

  17. Luo, Z.Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization J. Optim. Theory Appl. 72(1), pp 7–35 (1992) http://link.springer.com/article/10.1007%2FBF00939948?LI=true

  18. Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  19. Necoara, I., Clipici, D.: Distributed coordinate descent methods for composite minimization. arXiv:1312.5302 (2013)

    Google Scholar 

  20. Nesterov, Y.: Introductory Lectures on Convex Optimization. Applied Optimization, vol. 87. Kluwer, Boston (2004)

    Google Scholar 

  21. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  22. Niu, F., Recht, B., Ré, C., Wright, S.J.: Hogwild!: a lock-free approach to parallelizing stochastic gradient descent. Adv. Neural Inf. Process. Syst. 24, 693–701 (2011)

    Google Scholar 

  23. OpenMP Architecture Review Board: OpenMP Application Program Interface (2011)

    Google Scholar 

  24. Patrascu, A., Necoara, I.: Random coordinate descent methods for 0 regularized convex optimization. arXiv:1403.6622 (2014)

    Google Scholar 

  25. Richtárik, P., Takáč, M.: Efficient serial and parallel coordinate descent methods for huge-scale truss topology design. In: Operations Research Proceedings 2011, pp. 27–32. Springer, New York (2012)

    Google Scholar 

  26. Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Mathematical Programming, DOI:10.1007/s10107-015-0901-6 (2012)

    Google Scholar 

  27. Richtárik, P., Takáč, M.: Distributed coordinate descent method for learning with big data. arXiv:1310.2059 (2013)

    Google Scholar 

  28. Richtárik, P., Takáč, M.: On optimal probabilities in stochastic coordinate descent methods. arXiv:1310.3438 (2013)

    Google Scholar 

  29. Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  30. Saha, A., Tewari, A.: On the finite time convergence of cyclic coordinate descent methods. SIAM J. Optim. 23(1), 576–601 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  31. Salleh, N.S.M., Suliman, A., Ahmad, A.R.: Parallel execution of distributed SVM using MPI (CoDLib). In: Information Technology and Multimedia (ICIM), pp. 1–4. IEEE (2011)

    Google Scholar 

  32. Scherrer, C., Tewari, A., Halappanavar, M., Haglin, D.: Feature clustering for accelerating parallel coordinate descent. Adv. Neural Inf. Process. Syst. 25, 28–36 (2012)

    Google Scholar 

  33. Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567–599 (2013)

    MathSciNet  MATH  Google Scholar 

  34. Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127(1), 3–30 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  35. Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI-The Complete Reference, Volume 1: The MPI Core, 2nd (revised) edn. MIT Press, Cambridge, MA (1998)

    Google Scholar 

  36. Takáč, M., Bijral, A.S., Richtárik, P., Srebro, N.: Mini-batch primal and dual methods for SVMs. J. Mach. Learn. Res. W&CP 28, 1022–1030 (2013)

    Google Scholar 

  37. Tappenden, R., Richtárik, P., Gondzio, J: Inexact coordinate descent: complexity and preconditioning. arXiv:1304.5530 (2013)

    Google Scholar 

  38. Tappenden, R., Richtárik, P., Büke, B.: Separable approximations and decomposition methods for the augmented lagrangian. Optim. Methods Softw. arXiv:1308.6774 (2015) Doi:10.1080/10556788.2014.966824

    Google Scholar 

  39. Tappenden, R., Richtárik, P., Takáč, M.: On the complexity of parallel coordinate descent. Technical Report ERGO 15-001, The University of Edinburgh (2015) http://arxiv.org/abs/1503.03033

  40. Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  41. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2008)

    MathSciNet  Google Scholar 

  42. Tseng, P., Yun, S.: Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140, 513–535 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  43. Zhao, P., Zhang, T.: Stochastic optimization with importance sampling. arXiv:1401.2753 (2014)

    Google Scholar 

Download references

Acknowledgements

The first author was supported by EPSRC grant EP/I017127/1 (Mathematics for Vast Digital Resources) in 2012 and by the EU FP7 INSIGHT project (318225) subsequently. The second author was supported by EPSRC grant EP/I017127/1. The third author was supported by the Centre for Numerical Algorithms and Intelligent Software, funded by EPSRC grant EP/G036136/1 and the Scottish Funding Council.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Richtárik .

Editor information

Editors and Affiliations

Notation Glossary

Notation Glossary

Table 3
Table 4

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Mareček, J., Richtárik, P., Takáč, M. (2015). Distributed Block Coordinate Descent for Minimizing Partially Separable Functions. In: Al-Baali, M., Grandinetti, L., Purnama, A. (eds) Numerical Analysis and Optimization. Springer Proceedings in Mathematics & Statistics, vol 134. Springer, Cham. https://doi.org/10.1007/978-3-319-17689-5_11

Download citation

Publish with us

Policies and ethics