Asynchronous Computation Model for Large-Scale Iterative Computations

Zhang, Yanfeng; Gao, Qixin; Gao, Lixin; Wang, Cuirong

doi:10.1007/978-1-4939-1905-5_13

Yanfeng Zhang³,
Qixin Gao⁴,
Lixin Gao⁵ &
…
Cuirong Wang⁴

1527 Accesses

Abstract

Iterative algorithms are widely existed in machine learning and data mining applications. These algorithms have to be implemented in a large-scale distributed environment in order to scale to massive data sets. While synchronous iterations might result in unexpected poor performance due to some particular stragglers in a heterogeneous distributed environment, especially in a cloud environment. To bypass the synchronization barriers in iterative computations, this chapter introduces an asynchronous iteration model, delta-based accumulative iterative computation (DAIC). Different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, DAIC asynchronously updates the result by accumulating the “changes” between iterations. This chapter presents a general asynchronous computation model to describe DAIC and introduces a distributed framework for asynchronous iteration, Maiter. The experimental results show that Maiter outperforms many other state-of-the-art frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 179.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://code.google.com/p/maiter/.
2.
Open MPI. http://www.open-mpi.org/.
3.
More implementation example codes are provided on Maiter’s Google Code website http://code.google.com/p/maiter/.
4.
http://snap.stanford.edu/data/.
5.
The details of synthetic data sets can be found in [33].

References

S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video suggestion and discovery for youtube: taking random walks through the view graph. In Proc. Int’l Conf. World Wide Web (WWW ‘08), pages 895–904, 2008.
Google Scholar
G. M. Baudet. Asynchronous iterative methods for multiprocessors. J. ACM, 25:226–244, April 1978.
Article MATH MathSciNet Google Scholar
D. P. Bertsekas. Distributed asynchronous computation of fixed points. Math. Programming, 27:107–120, 1983.
Article MATH MathSciNet Google Scholar
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30:107–117, April 1998.
Article Google Scholar
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proc. Seventh Int’l Conf. World Wide Web (WWW ‘98), pages 107–117, 1998.
Google Scholar
Y. Bu, B. Howe, M. Balazinska, and D. M. Ernst. Haloop: Efficient iterative data processing on large clusters. Proc. VLDB Endow., 3(1), 2010.
Google Scholar
K. M. Chandy and L. Lamport. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1):63–75, Feb. 1985.
Article Google Scholar
D. Chazan and W. Miranker. Chaotic relaxation. Linear Algebra and its Applications, 2(2):199–222, 1969.
Article MATH MathSciNet Google Scholar
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proc. IEEE Int’l Workshop MapReduce (MapReduce ‘10), pages 810–818, 2010.
Google Scholar
A. Frommer and D. B. Szyld. On asynchronous iterations. J. Comput. Appl. Math., 123:201–216, November 2000.
Article MATH MathSciNet Google Scholar
K. Kambatla, N. Rapolu, S. Jagannathan, and A. Grama. Asynchronous algorithms in mapreduce. In Proc. IEEE Conf. Cluster (Cluster’ 10), pages 245 –254, 2010.
Google Scholar
U. Kang, C. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In Proc. IEEE Int’l Conf. Data Mining (ICDM ‘09), pages 229 –238, 2009.
Google Scholar
L. Katz. A new status index derived from sociometric analysis. Psychometrika, 1953.
Google Scholar
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46:604–632, 1999.
Article MATH MathSciNet Google Scholar
G. Kollias, E. Gallopoulos, and D. B. Szyld. Asynchronous iterative computations with web information retrieval structures: The pagerank case. In PARCO, volume 33 of John von Neumann Institute for Computing Series, pages 309–316, 2005.
Google Scholar
D. Lee and S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788–791, 1999.
Article Google Scholar
D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol., 58:1019–1031, May 2007.
Article Google Scholar
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8), 2012.
Google Scholar
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proc. ACM SIGMOD, pages 135–146, 2010.
Google Scholar
F. McSherry. A uniform approach to accelerated pagerank computation. In Proc. Int’l Conf. World Wide Web (WWW ‘05), pages 575–582, 2005.
Google Scholar
F. McSherry, D. Murray, R. Isaacs, and M. Isard. Differential dataflow. In Proc. Biennial Conf. Innovative Data Systems Research (CIDR ‘13), 2013.
Google Scholar
J. C. Miellou, D. El Baz, and P. Spiteri. A new class of asynchronous iterative algorithms with order intervals. Math. Comput., 67:237–255, January 1998.
Article MATH Google Scholar
D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: A universal execution engine for distributed data-flow computing. In Proc. USEINX Symp. Networked Systems Design and Implementation (NSDI ‘11), 2011.
Google Scholar
R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In Proc. USENIX Symp. Opearting Systems Design and Implementation (OSDI ‘10), 2010.
Google Scholar
M. S. R., I. G. Ives, and G. Sudipto. Rex: Recursive, deltabased datacentric computation. Proc. VLDB Endow., 5(8), 2012.
Google Scholar
H. H. Song, T. W. Cho, V. Dave, Y. Zhang, and L. Qiu. Scalable proximity estimation and link prediction in online social networks. In Proc. Int’l Conf. Internet Measurement (IMC ‘09), pages 322–335, 2009.
Google Scholar
G. Wang, W. Xie, A. Demers, and J. Gehrke. Asynchronous large-scale graph processing made easy. In Proc. Biennial Conf. Innovative Data Systems Research (CIDR ‘13), 2013.
Google Scholar
E. Yom-Tov and N. Slonim. Parallel pairwise clustering. In Proc. SIAM Intl. Conf. Data Mining (SDM ‘09), pages 745–755, 2009.
Google Scholar
L. Yucheng, G. Joseph, K. Aapo, B. Danny, G. Carlos, and M. H. Joseph. Graphlab: A new framework for parallel machine learning. In Proc. Int’l Conf. Uncertainty in Artificial Intelligence (UAI ‘10), 2010.
Google Scholar
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for. in-memory cluster computing. In Proc. USEINX Symp. Networked Systems Design and Implementation (NSDI’12), 2012.
Google Scholar
Y. Zhang, Q. Gao, L. Gao, and C. Wang. Priter: a distributed framework for prioritized iterative computations. In Proc. ACM Symp. Cloud Computing (SOCC ‘11), 2011.
Google Scholar
Y. Zhang, Q. Gao, L. Gao, and C. Wang. imapreduce: A distributed computing framework for iterative computation. J. Grid Comput., 10(1):47–68, 2012.
Google Scholar
Y. Zhang, Q. Gao, L. Gao, and C. Wang. Maiter: A message-passing distributed framework for accumulative iterative computation, http://faculty.neu.edu.cn/cc/zhangyf/papers/maiter-full.pdf. Northeastern university techical report, 2012.

Download references

Acknowledgements

This work was partially supported by U.S. NSF grants (CNS-1217284, CCF-1018114), National Natural Science Foundation of China (61300023), Fundamental Research Funds for the Central Universities (N120416001, N120416001).

Author information

Authors and Affiliations

Northeastern University, Lane 3, No. 11, Wenhua Road, Shenyang, Liaoning, 110819, China
Yanfeng Zhang
Northeastern University at Qinhuangdao, 143 Taishan Road, Qinhuangdao, Hebei, 066004, China
Qixin Gao & Cuirong Wang
University of Massachusetts Amherst, 151 Holdsworth Way, Amherst, MA, 01003, USA
Lixin Gao

Authors

Yanfeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qixin Gao
View author publications
You can also search for this author in PubMed Google Scholar
Lixin Gao
View author publications
You can also search for this author in PubMed Google Scholar
Cuirong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanfeng Zhang .

Editor information

Editors and Affiliations

University of Florida, Gainesville, Florida, USA
Xiaolin Li
Indiana University, Bloomington, Indiana, USA
Judy Qiu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhang, Y., Gao, Q., Gao, L., Wang, C. (2014). Asynchronous Computation Model for Large-Scale Iterative Computations. In: Li, X., Qiu, J. (eds) Cloud Computing for Data-Intensive Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1905-5_13

Download citation

DOI: https://doi.org/10.1007/978-1-4939-1905-5_13
Published: 15 November 2014
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-1904-8
Online ISBN: 978-1-4939-1905-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics