Skip to main content

Asynchronous COMID: The Theoretic Basis for Transmitted Data Sparsification Tricks on Parameter Server

  • Conference paper
  • First Online:
Big Scientific Data Benchmarks, Architecture, and Systems (SDBA 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 911))

Included in the following conference series:

  • 309 Accesses

Abstract

Asynchronous FTRL-proximal and L2 norm done at server are two widely used tricks in Parameters Server which is an implement of delayed SGD. Their commonness is leaving parts of updating computation on server which reduces the burden of network via making transmitted data sparse. But above tricks’ convergences are not well-proved. In this paper, based on above commonness, we propose a more general algorithm named as asynchronous COMID and prove its regret bound. We prove that asynchronous FTRL-proximal and L2 norm done at server are applications of asynchronous COMID, which demonstrates the convergences of above two tricks. Then, we conduct experiments to verify theoretical results. Experimental results show that compared with delayed SGD on Parameters Server, asynchronous COMID reduces the burden of the network without any harm on the mathematical convergence speed and final output.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)

  2. Baidu: Paddlepaddle (2016). https://github.com/PaddlePaddle/Paddle

  3. Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December, pp. 161–168 (2007)

    Google Scholar 

  4. Chaturapruek, S., Duchi, J.C., Re, C.: Asynchronous stochastic convex optimization: the noise is in the noise and SGD don’t care, pp. 1531–1539 (2015)

    Google Scholar 

  5. Chen, T., et al.: Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems. Statistics (2015)

    Google Scholar 

  6. Dean, J., et al.: Large scale distributed deep networks. In: International Conference on Neural Information Processing Systems, pp. 1223–1231 (2012)

    Google Scholar 

  7. Dekel, O., Gilad-Bachrach, R., Shamir, O., Xiao, L.: Optimal distributed online prediction using mini-batches. J. Mach. Learn. Res. 13(1), 165–202 (2012)

    MathSciNet  MATH  Google Scholar 

  8. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 257–269 (2010)

    MathSciNet  MATH  Google Scholar 

  9. Duchi, J., Tewari, A., Chicago, T.: Composite objective mirror descent. In: COLT 2010 - The Conference on Learning Theory, Haifa, Israel, June, pp. 14–26 (2010)

    Google Scholar 

  10. Feng, N., Recht, B., Re, C., Wright, S.J.: Hogwild!: a lock-free approach to parallelizing stochastic gradient descent. Adv. Neural Inf. Process. Syst. 24, 693–701 (2011)

    Google Scholar 

  11. Ho, Q., et al.: More effective distributed ml via a stale synchronous parallel parameter server. Adv. Neural Inf. Process. Syst. 2013(2013), 1223–1231 (2013)

    Google Scholar 

  12. Langford, J., Smola, A.J., Zinkevich, M.: Slow learners are fast. In: Advances in Neural Information Processing Systems 22: Conference on Neural Information Processing Systems 2009. Proceedings of A Meeting Held 7–10 December 2009, Vancouver, British Columbia, Canada, pp. 2331–2339 (2009)

    Google Scholar 

  13. Mcmahan, H.B.: Follow-the-regularized-leader and mirror descent: equivalence theorems and l1 regularization. JMLR 15, 2011 (2013)

    Google Scholar 

  14. Mcmahan, H.B., et al.: Ad clickprediction: a view from the trenches. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1222–1230 (2013)

    Google Scholar 

  15. Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. Siam J. Optim. 19, 1574–1609 (2009)

    Article  MathSciNet  Google Scholar 

  16. Nesterov, Y.: Primal-dual subgradient methods for convex problems. Math. Program. 120(1), 221–259 (2009)

    Article  MathSciNet  Google Scholar 

  17. Shalev-Shwartz, S., Srebro, N.: SVM optimization: inverse dependence on training set size. In: International Conference on Machine Learning, pp. 928–935 (2008)

    Google Scholar 

  18. Xing, E.P., et al.: Petuum: a new platform for distributed machine learning on big data. IEEE Trans. Big Data 1(2), 49–67 (2013)

    Article  Google Scholar 

  19. Yu, H., Lo, H., Hsieh, H.: Feature engineering and classifier ensemble for KDD cup 2010. In: JMLR Workshop and Conference (2010)

    Google Scholar 

  20. Zhu, Y., Chatterjee, S., Duchi, J.C., Lafferty, J.D.: Local minimax complexity of stochastic convex optimization. In: Neural Information Processing Systems, pp. 3423–3431 (2016)

    Google Scholar 

  21. Zinkevich, M., Weimer, M., Smola, A.J., Li, L.: Parallelized stochastic gradient descent. Adv. Neural Inf. Process. Syst. 23(23), 2595–2603 (2010)

    Google Scholar 

Download references

Acknowledgement

This work was supported by National Natural Science Foundation of China under Grant No. 61502450, Grant No. 61432018, and Grant No. 61521092; National Key R&D Program of China under Grant No. 2016YFB0200800, Grant No. 2017YFB0202302, and Grant No. 2016YFE0100300.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Shigang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Daning, C., Shigang, L., Yunquan, Z. (2019). Asynchronous COMID: The Theoretic Basis for Transmitted Data Sparsification Tricks on Parameter Server. In: Ren, R., Zheng, C., Zhan, J. (eds) Big Scientific Data Benchmarks, Architecture, and Systems. SDBA 2018. Communications in Computer and Information Science, vol 911. Springer, Singapore. https://doi.org/10.1007/978-981-13-5910-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-5910-1_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-5909-5

  • Online ISBN: 978-981-13-5910-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics