Abstract
Iterative algorithms are widely existed in machine learning and data mining applications. These algorithms have to be implemented in a large-scale distributed environment in order to scale to massive data sets. While synchronous iterations might result in unexpected poor performance due to some particular stragglers in a heterogeneous distributed environment, especially in a cloud environment. To bypass the synchronization barriers in iterative computations, this chapter introduces an asynchronous iteration model, delta-based accumulative iterative computation (DAIC). Different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, DAIC asynchronously updates the result by accumulating the “changes” between iterations. This chapter presents a general asynchronous computation model to describe DAIC and introduces a distributed framework for asynchronous iteration, Maiter. The experimental results show that Maiter outperforms many other state-of-the-art frameworks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Open MPI. http://www.open-mpi.org/.
- 3.
More implementation example codes are provided on Maiter’s Google Code website http://code.google.com/p/maiter/.
- 4.
- 5.
The details of synthetic data sets can be found in [33].
References
S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video suggestion and discovery for youtube: taking random walks through the view graph. In Proc. Int’l Conf. World Wide Web (WWW ‘08), pages 895–904, 2008.
G. M. Baudet. Asynchronous iterative methods for multiprocessors. J. ACM, 25:226–244, April 1978.
D. P. Bertsekas. Distributed asynchronous computation of fixed points. Math. Programming, 27:107–120, 1983.
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30:107–117, April 1998.
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proc. Seventh Int’l Conf. World Wide Web (WWW ‘98), pages 107–117, 1998.
Y. Bu, B. Howe, M. Balazinska, and D. M. Ernst. Haloop: Efficient iterative data processing on large clusters. Proc. VLDB Endow., 3(1), 2010.
K. M. Chandy and L. Lamport. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1):63–75, Feb. 1985.
D. Chazan and W. Miranker. Chaotic relaxation. Linear Algebra and its Applications, 2(2):199–222, 1969.
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative mapreduce. In Proc. IEEE Int’l Workshop MapReduce (MapReduce ‘10), pages 810–818, 2010.
A. Frommer and D. B. Szyld. On asynchronous iterations. J. Comput. Appl. Math., 123:201–216, November 2000.
K. Kambatla, N. Rapolu, S. Jagannathan, and A. Grama. Asynchronous algorithms in mapreduce. In Proc. IEEE Conf. Cluster (Cluster’ 10), pages 245 –254, 2010.
U. Kang, C. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In Proc. IEEE Int’l Conf. Data Mining (ICDM ‘09), pages 229 –238, 2009.
L. Katz. A new status index derived from sociometric analysis. Psychometrika, 1953.
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46:604–632, 1999.
G. Kollias, E. Gallopoulos, and D. B. Szyld. Asynchronous iterative computations with web information retrieval structures: The pagerank case. In PARCO, volume 33 of John von Neumann Institute for Computing Series, pages 309–316, 2005.
D. Lee and S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788–791, 1999.
D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. Technol., 58:1019–1031, May 2007.
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow., 5(8), 2012.
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proc. ACM SIGMOD, pages 135–146, 2010.
F. McSherry. A uniform approach to accelerated pagerank computation. In Proc. Int’l Conf. World Wide Web (WWW ‘05), pages 575–582, 2005.
F. McSherry, D. Murray, R. Isaacs, and M. Isard. Differential dataflow. In Proc. Biennial Conf. Innovative Data Systems Research (CIDR ‘13), 2013.
J. C. Miellou, D. El Baz, and P. Spiteri. A new class of asynchronous iterative algorithms with order intervals. Math. Comput., 67:237–255, January 1998.
D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: A universal execution engine for distributed data-flow computing. In Proc. USEINX Symp. Networked Systems Design and Implementation (NSDI ‘11), 2011.
R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In Proc. USENIX Symp. Opearting Systems Design and Implementation (OSDI ‘10), 2010.
M. S. R., I. G. Ives, and G. Sudipto. Rex: Recursive, deltabased datacentric computation. Proc. VLDB Endow., 5(8), 2012.
H. H. Song, T. W. Cho, V. Dave, Y. Zhang, and L. Qiu. Scalable proximity estimation and link prediction in online social networks. In Proc. Int’l Conf. Internet Measurement (IMC ‘09), pages 322–335, 2009.
G. Wang, W. Xie, A. Demers, and J. Gehrke. Asynchronous large-scale graph processing made easy. In Proc. Biennial Conf. Innovative Data Systems Research (CIDR ‘13), 2013.
E. Yom-Tov and N. Slonim. Parallel pairwise clustering. In Proc. SIAM Intl. Conf. Data Mining (SDM ‘09), pages 745–755, 2009.
L. Yucheng, G. Joseph, K. Aapo, B. Danny, G. Carlos, and M. H. Joseph. Graphlab: A new framework for parallel machine learning. In Proc. Int’l Conf. Uncertainty in Artificial Intelligence (UAI ‘10), 2010.
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for. in-memory cluster computing. In Proc. USEINX Symp. Networked Systems Design and Implementation (NSDI’12), 2012.
Y. Zhang, Q. Gao, L. Gao, and C. Wang. Priter: a distributed framework for prioritized iterative computations. In Proc. ACM Symp. Cloud Computing (SOCC ‘11), 2011.
Y. Zhang, Q. Gao, L. Gao, and C. Wang. imapreduce: A distributed computing framework for iterative computation. J. Grid Comput., 10(1):47–68, 2012.
Y. Zhang, Q. Gao, L. Gao, and C. Wang. Maiter: A message-passing distributed framework for accumulative iterative computation, http://faculty.neu.edu.cn/cc/zhangyf/papers/maiter-full.pdf. Northeastern university techical report, 2012.
Acknowledgements
This work was partially supported by U.S. NSF grants (CNS-1217284, CCF-1018114), National Natural Science Foundation of China (61300023), Fundamental Research Funds for the Central Universities (N120416001, N120416001).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Zhang, Y., Gao, Q., Gao, L., Wang, C. (2014). Asynchronous Computation Model for Large-Scale Iterative Computations. In: Li, X., Qiu, J. (eds) Cloud Computing for Data-Intensive Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-1905-5_13
Download citation
DOI: https://doi.org/10.1007/978-1-4939-1905-5_13
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-1904-8
Online ISBN: 978-1-4939-1905-5
eBook Packages: Computer ScienceComputer Science (R0)