Abstract
A distributed randomized block coordinate descent method for minimizing a convex function of a huge number of variables is proposed. The complexity of the method is analyzed under the assumption that the smooth part of the objective function is partially block separable. The number of iterations required is bounded by a function of the error and the degree of separability, which extends the results in Richtárik and Takác (Parallel Coordinate Descent Methods for Big Data Optimization, Mathematical Programming, DOI:10.1007/s10107-015-0901-6) to a distributed environment. Several approaches to the distribution and synchronization of the computation across a cluster of multi-core computer are described and promising computational results are provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note that \(\xi \in \{\lceil \tfrac{\omega }{\mathit{C}}\rceil,\ldots,\omega \}\).
- 2.
In fact, \(\vert \hat{\mathit{Z}}\vert = \mathit{C}\tau\) with probability 1.
- 3.
For the start of the algorithm we define \(\delta g_{l}^{(\mathit{c})} =\delta G_{l}^{(\mathit{c})} = \mathbf{0}\) for all l < 0.
References
Alham, N.K., Li, M., Liu, Y., Hammoud, S.: A MapReduce-based distributed SVM algorithm for automatic image annotation. Comput. Math. Appl. 62(7), 2801–2811 (2011)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall Inc., Upper Saddle River (1989)
Chang, E.Y., Zhu, K., Wang, H., Bai, H., Li, J., Qiu, Z., Cui, H.: PSVM: parallelizing support vector machines on distributed computers. Adv. Neural Inf. Process. Syst. 20, 1–18 (2007)
Fercoq, O., Richtárik, P.: Accelerated, parallel and proximal coordinate descent. arXiv:1312.5799 (2013)
Fercoq, O., Richtárik, P.: Smooth minimization of nonsmooth functions with parallel coordinate descent methods. arXiv:1309.5885 (2013)
Fercoq, O., Qu, Z., Richtárik, P., Takáč, M.: Fast distributed coordinate descent for non-strongly convex losses. In: IEEE Workshop on Machine Learning for Signal Processing (2014)
Ge, D., Jiang, X., Ye, Y.: A note on the complexity of ℓ p minimization. Math. Program. 129(2), 285–299 (2011)
Hsieh,C.-J., Chang, K.-W., Lin, C.-J., Sathiya Keerthi, S., Sundararajan, S.: A dual coordinate descent method for large-scale linear SVM. In: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pp. 408–415. ACM, New York (2008)
InfiniBand Trade Association: InfiniBand Architecture Specification, vol. 1, Release 1.0 (2005)
Jaggi, M., Smith, V., Takáč, M., Terhorst, J., Hofmann, T., Jordan, M.I.: Communication-efficient distributed dual coordinate ascent. In: Advances in Neural Information Processing Systems, vol. 27, 3068–3076 (NIPS 2014) http://papers.nips.cc/book/advances-in-neural-information-processing-systems-27-2014
Lee, Y.T., Sidford, A.: Efficient accelerated coordinate descent methods and faster algorithms for solving linear systems. In: 54th Annual Symposium on Foundations of Computer Science. IEEE, New York (2013)
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
LIBSVM Data: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html. Accessed 25 Oct 2014
Liu, J., Wright, S.J., Ré, C., Bittorf, V.: An asynchronous parallel stochastic coordinate descent algorithm. arXiv:1311.1873 (2013)
Lu, Z., Xiao, L.: On the complexity analysis of randomized block-coordinate descent methods. arXiv:1305.4723 (2013)
Luo, Z.Q., Tseng, P.: On the convergence of the coordinate descent method for convex differentiable minimization J. Optim. Theory Appl. 72(1), pp 7–35 (1992) http://link.springer.com/article/10.1007%2FBF00939948?LI=true
Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)
Necoara, I., Clipici, D.: Distributed coordinate descent methods for composite minimization. arXiv:1312.5302 (2013)
Nesterov, Y.: Introductory Lectures on Convex Optimization. Applied Optimization, vol. 87. Kluwer, Boston (2004)
Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)
Niu, F., Recht, B., Ré, C., Wright, S.J.: Hogwild!: a lock-free approach to parallelizing stochastic gradient descent. Adv. Neural Inf. Process. Syst. 24, 693–701 (2011)
OpenMP Architecture Review Board: OpenMP Application Program Interface (2011)
Patrascu, A., Necoara, I.: Random coordinate descent methods for ℓ 0 regularized convex optimization. arXiv:1403.6622 (2014)
Richtárik, P., Takáč, M.: Efficient serial and parallel coordinate descent methods for huge-scale truss topology design. In: Operations Research Proceedings 2011, pp. 27–32. Springer, New York (2012)
Richtárik, P., Takáč, M.: Parallel coordinate descent methods for big data optimization. Mathematical Programming, DOI:10.1007/s10107-015-0901-6 (2012)
Richtárik, P., Takáč, M.: Distributed coordinate descent method for learning with big data. arXiv:1310.2059 (2013)
Richtárik, P., Takáč, M.: On optimal probabilities in stochastic coordinate descent methods. arXiv:1310.3438 (2013)
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Program. 144(1–2), 1–38 (2014)
Saha, A., Tewari, A.: On the finite time convergence of cyclic coordinate descent methods. SIAM J. Optim. 23(1), 576–601 (2013)
Salleh, N.S.M., Suliman, A., Ahmad, A.R.: Parallel execution of distributed SVM using MPI (CoDLib). In: Information Technology and Multimedia (ICIM), pp. 1–4. IEEE (2011)
Scherrer, C., Tewari, A., Halappanavar, M., Haglin, D.: Feature clustering for accelerating parallel coordinate descent. Adv. Neural Inf. Process. Syst. 25, 28–36 (2012)
Shalev-Shwartz, S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss. J. Mach. Learn. Res. 14(1), 567–599 (2013)
Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for SVM. Math. Program. 127(1), 3–30 (2011)
Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI-The Complete Reference, Volume 1: The MPI Core, 2nd (revised) edn. MIT Press, Cambridge, MA (1998)
Takáč, M., Bijral, A.S., Richtárik, P., Srebro, N.: Mini-batch primal and dual methods for SVMs. J. Mach. Learn. Res. W&CP 28, 1022–1030 (2013)
Tappenden, R., Richtárik, P., Gondzio, J: Inexact coordinate descent: complexity and preconditioning. arXiv:1304.5530 (2013)
Tappenden, R., Richtárik, P., Büke, B.: Separable approximations and decomposition methods for the augmented lagrangian. Optim. Methods Softw. arXiv:1308.6774 (2015) Doi:10.1080/10556788.2014.966824
Tappenden, R., Richtárik, P., Takáč, M.: On the complexity of parallel coordinate descent. Technical Report ERGO 15-001, The University of Edinburgh (2015) http://arxiv.org/abs/1503.03033
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2008)
Tseng, P., Yun, S.: Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140, 513–535 (2009)
Zhao, P., Zhang, T.: Stochastic optimization with importance sampling. arXiv:1401.2753 (2014)
Acknowledgements
The first author was supported by EPSRC grant EP/I017127/1 (Mathematics for Vast Digital Resources) in 2012 and by the EU FP7 INSIGHT project (318225) subsequently. The second author was supported by EPSRC grant EP/I017127/1. The third author was supported by the Centre for Numerical Algorithms and Intelligent Software, funded by EPSRC grant EP/G036136/1 and the Scottish Funding Council.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Notation Glossary
Notation Glossary
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Mareček, J., Richtárik, P., Takáč, M. (2015). Distributed Block Coordinate Descent for Minimizing Partially Separable Functions. In: Al-Baali, M., Grandinetti, L., Purnama, A. (eds) Numerical Analysis and Optimization. Springer Proceedings in Mathematics & Statistics, vol 134. Springer, Cham. https://doi.org/10.1007/978-3-319-17689-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-17689-5_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17688-8
Online ISBN: 978-3-319-17689-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)