Application on Big Data Processing

Wang, Dan; Han, Zhu

doi:10.1007/978-3-319-20448-2_4

Dan Wang¹⁷ &
Zhu Han¹⁸

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

1102 Accesses

Abstract

Today’s lightening-fast generation of data from massive sources and advanced data analytics have made mining the information from big data possible. We have witnessed the success of many big data applications. For example, Amazon uses its massive historical shipment tracking data to recommend goods to targeted customers, and Google uses billions of query data to predict flu trends, which can sometimes do one week earlier than the National Centers for Disease Control and Prevention (CDC).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that the value is not being reported, and thus, the information received by the master node for each item will only require a small amount of space.
2.
Here we make an implicit assumption that each pair represents a workload of unit size, but our algorithm can easily work also for variable integer workload weights.

References

Wikipedia page-to-page link, available at http://haselgrove.id.au/wikipedia.htm.
Y. Bu, B. Howe, M. Balazinska, and M. Ernst, “HaLoop: efficient iterative data processing on large clusters”, in Proc. of the VLDB Endowment, Sept. 2010.
Google Scholar
H. Chang, M. Kodialam, R. Kompella, T. V. Lakshman, M. Lee, and S. Mukherjee, “Scheduling in mapreduce-like systems for fast completion time”, in Proc. of IEEE INFOCOM’11, Shanghai, China, Apr. 2011.
Google Scholar
F. Chen, M. Kodialam, and T. V. Lakshman, in Proc. IEEE INFOCOM’12, “Joint scheduling of processing and Shuffle phases in MapReduce systems”, Orlando, Florida, USA, Mar. 2012.
Google Scholar
J. Devore, Probability & Statistics for Engineering and the Sciences, CengageBrain.com, 2012.
J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S. Bae, J. Qiu, and G. Fox, “Twister: a runtime for iterative MapReduce”, in Proc. ACM HPDC’10, Chicago, Illinois, USA, June, 2010.
Google Scholar
M. Englert, D. Ozmen, and M. Westermann, “The Power of Reordering for Online Minimum Makespan Scheduling”, in Proc. IEEE FOCS’08, Philadelphia, Pennsylvania, USA, Oct. 2008.
Google Scholar
B. Gufler, N. Augsten, A. Reiser, and A. Kemper, “Handling Data Skew In MapReduce”, in The First International Conference on Cloud Computing and Services Science, 2011.
Google Scholar
B. Gufler, N. Augsten, A. Reiser, and A. Kemper, “Load Balancing in MapReduce Based on Scalable Cardinality Estimates”, in Proc. IEEE ICDE’12, Washington, DC, USA, Apr. 2012.
Google Scholar
J. Kleinberg and E. Tardos, Algorithm Design, Pearson Education India, 2006.
Google Scholar
Y. Kwon, M. Balazinska, B. Howe, and J. Rolia, “A study of skew in mapreduce applications”, in The 5th Open Cirrus Summit, 2011.
Google Scholar
Y. Kwon, M. Balazinska, B. Howe, and J. Rolia, “SkewTune: Mitigating Skew in MapReduce Applications”, in Proc. ACM SIGMOD’12, Scottsdale, Arizona, USA, May. 2012.
Google Scholar
W. Lang and J. Patel, “Energy management for MapReduce clusters”, in Proc. of the VLDB Endowment, Sept. 2010.
Google Scholar
J. Leverich and C. Kozyrakis, “On the energy (in) efficiency of Hadoop clusters”, in ACM SIGOPS Operating Systems Review, Jan. 2010.
Google Scholar
B. Li, E. Mazur, Y. Diao, A. McGregor, and P. Shenoy, “A platform for scalable one-pass analytics using MapReduce”, in Proc. ACM SIGMOD’11, Athens, Greece, June, 2011.
Google Scholar
J. Lin, “The Curse of Zipf and Limits to Parallelization: A Look at the Stragglers Problem in MapReduce”, in The 7th Workshop on Large-Scale Distributed Systems for Information Retrieval, July. 2009.
Google Scholar
G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: a system for large-scale graph processing”, in Proc. ACM SIGMOD’10, Indianapolis, Indiana, USA, June, 2010.
Google Scholar
K. Morton, M. Balazinska, and D. Grossman, “ParaTimer: a progress indicator for MapReduce DAGs”, in Proc. ACM SIGMOD’10, Indianapolis, Indiana, USA, June. 2010.
Google Scholar
S. Ramakrishnan, G. Swart, and A. Urmanov, “Balancing reducer skew in MapReduce workloads using progressive sampling”, in Proc. ACM SoCC’12, San Jose, California, USA, 2012.
Google Scholar
M. Schatz, “CloudBurst: highly sensitive read mapping with MapReduce”, in Bioinformatics, vol. 25, no. 11, pp. 1363–1369, 2009.
Google Scholar
J. Stamos and H. Young, “A symmetric fragment and replicate algorithm for distributed joins”, in IEEE Transactions on Parallel and Distributed Systems, 1993.
Google Scholar
J. Tan, X. Meng, and L. Zhang, “Coupling task progress for MapReduce resource-aware scheduling”, in Proc. IEEE INFOCOM’13, Turin, Italy, Apr. 2013.
Google Scholar
W. Yan and P. Larson, “Eager Aggregation and Lazy Aggregation”, in Proc. VLDB’95, Zurich, Switzerland, Sept. 1995.
Google Scholar
H. Yang, et. al., “Cloud 9: A MapReduce library for Hadoop, available at http://lintool.github.io/Cloud9/
H. Yang, A. Dasdan, R. Hsiao, and D. Parker, “Map-reduce-merge: simplified relational data processing on large clusters”, in Proc. ACM SIGMOD’07, Beijing, China, June, 2007.
Google Scholar
M. Zaharia, A. Konwinski, A. Joseph, R. Katz, and I. Stoica, “Improving MapReduce Performance in Heterogeneous Environments”, in Proc. USENIX OSDI’08, Dec. 2008.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong, SAR
Dan Wang
Department of Engineering, University of Houston, Houston, TX, USA
Zhu Han

Authors

Dan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhu Han
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, D., Han, Z. (2015). Application on Big Data Processing. In: Sublinear Algorithms for Big Data Applications. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-20448-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-20448-2_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20447-5
Online ISBN: 978-3-319-20448-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics