Skip to main content

Asynchronous Computation Model for Large-Scale Iterative Computations

  • Chapter
  • First Online:
Cloud Computing for Data-Intensive Applications
  • 1527 Accesses

Abstract

Iterative algorithms are widely existed in machine learning and data mining applications. These algorithms have to be implemented in a large-scale distributed environment in order to scale to massive data sets. While synchronous iterations might result in unexpected poor performance due to some particular stragglers in a heterogeneous distributed environment, especially in a cloud environment. To bypass the synchronization barriers in iterative computations, this chapter introduces an asynchronous iteration model, delta-based accumulative iterative computation (DAIC). Different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, DAIC asynchronously updates the result by accumulating the “changes” between iterations. This chapter presents a general asynchronous computation model to describe DAIC and introduces a distributed framework for asynchronous iteration, Maiter. The experimental results show that Maiter outperforms many other state-of-the-art frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://code.google.com/p/maiter/.

  2. 2.

    Open MPI. http://www.open-mpi.org/.

  3. 3.

    More implementation example codes are provided on Maiter’s Google Code website http://code.google.com/p/maiter/.

  4. 4.

    http://snap.stanford.edu/data/.

  5. 5.

    The details of synthetic data sets can be found in [33].

References

  1. S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video suggestion and discovery for youtube: taking random walks through the view graph. In Proc. Int’l Conf. World Wide Web (WWW ‘08), pages 895–904, 2008.

    Google Scholar 

  2. G. M. Baudet. Asynchronous iterative methods for multiprocessors. J. ACM, 25:226–244, April 1978.

    Article  MATH  MathSciNet  Google Scholar 

  3. D. P. Bertsekas. Distributed asynchronous computation of fixed points. Math. Programming, 27:107–120, 1983.

    Article  MATH  MathSciNet  Google Scholar 

  4. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30:107–117, April 1998.

    Article  Google Scholar 

  5. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proc. Seventh Int’l Conf. World Wide Web (WWW ‘98), pages 107–117, 1998.

    Google Scholar 

  6. Y. Bu, B. Howe, M. Balazinska, and D. M. Ernst. Haloop: Efficient iterative data processing on large clusters. Proc. VLDB Endow., 3(1), 2010.

    Google Scholar 

  7. K. M. Chandy and L. Lamport. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1):63–75, Feb. 1985.

    Article  Google Scholar 

  8. D. Chazan and W. Miranker. Chaotic relaxation. Linear Algebra and its Applications, 2(2):199–222, 1969.

    Article  MATH  MathSciNet  Google Scholar 

  9. J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proc. IEEE Int’l Workshop MapReduce (MapReduce ‘10), pages 810–818, 2010.

    Google Scholar 

  10. A. Frommer and D. B. Szyld. On asynchronous iterations. J. Comput. Appl. Math., 123:201–216, November 2000.

    Article  MATH  MathSciNet  Google Scholar 

  11. K. Kambatla, N. Rapolu, S. Jagannathan, and A. Grama. Asynchronous algorithms in mapreduce. In Proc. IEEE Conf. Cluster (Cluster’ 10), pages 245 –254, 2010.

    Google Scholar 

  12. U. Kang, C. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In Proc. IEEE Int’l Conf. Data Mining (ICDM ‘09), pages 229 –238, 2009.

    Google Scholar 

  13. L. Katz. A new status index derived from sociometric analysis. Psychometrika, 1953.

    Google Scholar 

  14. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46:604–632, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  15. G. Kollias, E. Gallopoulos, and D. B. Szyld. Asynchronous iterative computations with web information retrieval structures: The pagerank case. In PARCO, volume 33 of John von Neumann Institute for Computing Series, pages 309–316, 2005.

    Google Scholar 

  16. D. Lee and S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788–791, 1999.

    Article  Google Scholar 

  17. D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol., 58:1019–1031, May 2007.

    Article  Google Scholar 

  18. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8), 2012.

    Google Scholar 

  19. G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proc. ACM SIGMOD, pages 135–146, 2010.

    Google Scholar 

  20. F. McSherry. A uniform approach to accelerated pagerank computation. In Proc. Int’l Conf. World Wide Web (WWW ‘05), pages 575–582, 2005.

    Google Scholar 

  21. F. McSherry, D. Murray, R. Isaacs, and M. Isard. Differential dataflow. In Proc. Biennial Conf. Innovative Data Systems Research (CIDR ‘13), 2013.

    Google Scholar 

  22. J. C. Miellou, D. El Baz, and P. Spiteri. A new class of asynchronous iterative algorithms with order intervals. Math. Comput., 67:237–255, January 1998.

    Article  MATH  Google Scholar 

  23. D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: A universal execution engine for distributed data-flow computing. In Proc. USEINX Symp. Networked Systems Design and Implementation (NSDI ‘11), 2011.

    Google Scholar 

  24. R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In Proc. USENIX Symp. Opearting Systems Design and Implementation (OSDI ‘10), 2010.

    Google Scholar 

  25. M. S. R., I. G. Ives, and G. Sudipto. Rex: Recursive, deltabased datacentric computation. Proc. VLDB Endow., 5(8), 2012.

    Google Scholar 

  26. H. H. Song, T. W. Cho, V. Dave, Y. Zhang, and L. Qiu. Scalable proximity estimation and link prediction in online social networks. In Proc. Int’l Conf. Internet Measurement (IMC ‘09), pages 322–335, 2009.

    Google Scholar 

  27. G. Wang, W. Xie, A. Demers, and J. Gehrke. Asynchronous large-scale graph processing made easy. In Proc. Biennial Conf. Innovative Data Systems Research (CIDR ‘13), 2013.

    Google Scholar 

  28. E. Yom-Tov and N. Slonim. Parallel pairwise clustering. In Proc. SIAM Intl. Conf. Data Mining (SDM ‘09), pages 745–755, 2009.

    Google Scholar 

  29. L. Yucheng, G. Joseph, K. Aapo, B. Danny, G. Carlos, and M. H. Joseph. Graphlab: A new framework for parallel machine learning. In Proc. Int’l Conf. Uncertainty in Artificial Intelligence (UAI ‘10), 2010.

    Google Scholar 

  30. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for. in-memory cluster computing. In Proc. USEINX Symp. Networked Systems Design and Implementation (NSDI’12), 2012.

    Google Scholar 

  31. Y. Zhang, Q. Gao, L. Gao, and C. Wang. Priter: a distributed framework for prioritized iterative computations. In Proc. ACM Symp. Cloud Computing (SOCC ‘11), 2011.

    Google Scholar 

  32. Y. Zhang, Q. Gao, L. Gao, and C. Wang. imapreduce: A distributed computing framework for iterative computation. J. Grid Comput., 10(1):47–68, 2012.

    Google Scholar 

  33. Y. Zhang, Q. Gao, L. Gao, and C. Wang. Maiter: A message-passing distributed framework for accumulative iterative computation, http://faculty.neu.edu.cn/cc/zhangyf/papers/maiter-full.pdf. Northeastern university techical report, 2012.

Download references

Acknowledgements

This work was partially supported by U.S. NSF grants (CNS-1217284, CCF-1018114), National Natural Science Foundation of China (61300023), Fundamental Research Funds for the Central Universities (N120416001, N120416001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanfeng Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Zhang, Y., Gao, Q., Gao, L., Wang, C. (2014). Asynchronous Computation Model for Large-Scale Iterative Computations. In: Li, X., Qiu, J. (eds) Cloud Computing for Data-Intensive Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1905-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-1905-5_13

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-1904-8

  • Online ISBN: 978-1-4939-1905-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics