Asynchronous COMID: The Theoretic Basis for Transmitted Data Sparsification Tricks on Parameter Server

Daning, Cheng; Shigang, Li; Yunquan, Zhang

doi:10.1007/978-981-13-5910-1_6

Cheng Daning^12,13,
Li Shigang¹² &
Zhang Yunquan¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 911))

Included in the following conference series:

Workshop on Big Scientific Data Benchmarks, Architecture, and Systems

309 Accesses

Abstract

Asynchronous FTRL-proximal and L2 norm done at server are two widely used tricks in Parameters Server which is an implement of delayed SGD. Their commonness is leaving parts of updating computation on server which reduces the burden of network via making transmitted data sparse. But above tricks’ convergences are not well-proved. In this paper, based on above commonness, we propose a more general algorithm named as asynchronous COMID and prove its regret bound. We prove that asynchronous FTRL-proximal and L2 norm done at server are applications of asynchronous COMID, which demonstrates the convergences of above two tricks. Then, we conduct experiments to verify theoretical results. Experimental results show that compared with delayed SGD on Parameters Server, asynchronous COMID reduces the burden of the network without any harm on the mathematical convergence speed and final output.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Baidu: Paddlepaddle (2016). https://github.com/PaddlePaddle/Paddle
Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December, pp. 161–168 (2007)
Google Scholar
Chaturapruek, S., Duchi, J.C., Re, C.: Asynchronous stochastic convex optimization: the noise is in the noise and SGD don’t care, pp. 1531–1539 (2015)
Google Scholar
Chen, T., et al.: Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems. Statistics (2015)
Google Scholar
Dean, J., et al.: Large scale distributed deep networks. In: International Conference on Neural Information Processing Systems, pp. 1223–1231 (2012)
Google Scholar
Dekel, O., Gilad-Bachrach, R., Shamir, O., Xiao, L.: Optimal distributed online prediction using mini-batches. J. Mach. Learn. Res. 13(1), 165–202 (2012)
MathSciNet MATH Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 257–269 (2010)
MathSciNet MATH Google Scholar
Duchi, J., Tewari, A., Chicago, T.: Composite objective mirror descent. In: COLT 2010 - The Conference on Learning Theory, Haifa, Israel, June, pp. 14–26 (2010)
Google Scholar
Feng, N., Recht, B., Re, C., Wright, S.J.: Hogwild!: a lock-free approach to parallelizing stochastic gradient descent. Adv. Neural Inf. Process. Syst. 24, 693–701 (2011)
Google Scholar
Ho, Q., et al.: More effective distributed ml via a stale synchronous parallel parameter server. Adv. Neural Inf. Process. Syst. 2013(2013), 1223–1231 (2013)
Google Scholar
Langford, J., Smola, A.J., Zinkevich, M.: Slow learners are fast. In: Advances in Neural Information Processing Systems 22: Conference on Neural Information Processing Systems 2009. Proceedings of A Meeting Held 7–10 December 2009, Vancouver, British Columbia, Canada, pp. 2331–2339 (2009)
Google Scholar
Mcmahan, H.B.: Follow-the-regularized-leader and mirror descent: equivalence theorems and l1 regularization. JMLR 15, 2011 (2013)
Google Scholar
Mcmahan, H.B., et al.: Ad clickprediction: a view from the trenches. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1222–1230 (2013)
Google Scholar
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. Siam J. Optim. 19, 1574–1609 (2009)
Article MathSciNet Google Scholar
Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009)
Article MathSciNet Google Scholar
Shalev-Shwartz, S., Srebro, N.: SVM optimization: inverse dependence on training set size. In: International Conference on Machine Learning, pp. 928–935 (2008)
Google Scholar
Xing, E.P., et al.: Petuum: a new platform for distributed machine learning on big data. IEEE Trans. Big Data 1(2), 49–67 (2013)
Article Google Scholar
Yu, H., Lo, H., Hsieh, H.: Feature engineering and classifier ensemble for KDD cup 2010. In: JMLR Workshop and Conference (2010)
Google Scholar
Zhu, Y., Chatterjee, S., Duchi, J.C., Lafferty, J.D.: Local minimax complexity of stochastic convex optimization. In: Neural Information Processing Systems, pp. 3423–3431 (2016)
Google Scholar
Zinkevich, M., Weimer, M., Smola, A.J., Li, L.: Parallelized stochastic gradient descent. Adv. Neural Inf. Process. Syst. 23(23), 2595–2603 (2010)
Google Scholar

Download references

Acknowledgement

This work was supported by National Natural Science Foundation of China under Grant No. 61502450, Grant No. 61432018, and Grant No. 61521092; National Key R&D Program of China under Grant No. 2016YFB0200800, Grant No. 2017YFB0202302, and Grant No. 2016YFE0100300.

Author information

Authors and Affiliations

SKL of Computer Architecture, Institute of Computing Technology, CAS, Beijing, China
Cheng Daning, Li Shigang & Zhang Yunquan
University of Chinese Academy of Sciences, Beijing, China
Cheng Daning

Authors

Cheng Daning
View author publications
You can also search for this author in PubMed Google Scholar
Li Shigang
View author publications
You can also search for this author in PubMed Google Scholar
Zhang Yunquan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Shigang .

Editor information

Editors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Rui Ren
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Chen Zheng
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Jianfeng Zhan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Daning, C., Shigang, L., Yunquan, Z. (2019). Asynchronous COMID: The Theoretic Basis for Transmitted Data Sparsification Tricks on Parameter Server. In: Ren, R., Zheng, C., Zhan, J. (eds) Big Scientific Data Benchmarks, Architecture, and Systems. SDBA 2018. Communications in Computer and Information Science, vol 911. Springer, Singapore. https://doi.org/10.1007/978-981-13-5910-1_6

Download citation

DOI: https://doi.org/10.1007/978-981-13-5910-1_6
Published: 12 January 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-5909-5
Online ISBN: 978-981-13-5910-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics