Skip to main content

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

  • 1102 Accesses

Abstract

Today’s lightening-fast generation of data from massive sources and advanced data analytics have made mining the information from big data possible. We have witnessed the success of many big data applications. For example, Amazon uses its massive historical shipment tracking data to recommend goods to targeted customers, and Google uses billions of query data to predict flu trends, which can sometimes do one week earlier than the National Centers for Disease Control and Prevention (CDC).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that the value is not being reported, and thus, the information received by the master node for each item will only require a small amount of space.

  2. 2.

    Here we make an implicit assumption that each pair represents a workload of unit size, but our algorithm can easily work also for variable integer workload weights.

References

  1. Wikipedia page-to-page link, available at http://haselgrove.id.au/wikipedia.htm.

  2. Y. Bu, B. Howe, M. Balazinska, and M. Ernst, “HaLoop: efficient iterative data processing on large clusters”, in Proc. of the VLDB Endowment, Sept. 2010.

    Google Scholar 

  3. H. Chang, M. Kodialam, R. Kompella, T. V. Lakshman, M. Lee, and S. Mukherjee, “Scheduling in mapreduce-like systems for fast completion time”, in Proc. of IEEE INFOCOM’11, Shanghai, China, Apr. 2011.

    Google Scholar 

  4. F. Chen, M. Kodialam, and T. V. Lakshman, in Proc. IEEE INFOCOM’12, “Joint scheduling of processing and Shuffle phases in MapReduce systems”, Orlando, Florida, USA, Mar. 2012.

    Google Scholar 

  5. J. Devore, Probability & Statistics for Engineering and the Sciences, CengageBrain.com, 2012.

  6. J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S. Bae, J. Qiu, and G. Fox, “Twister: a runtime for iterative MapReduce”, in Proc. ACM HPDC’10, Chicago, Illinois, USA, June, 2010.

    Google Scholar 

  7. M. Englert, D. Ozmen, and M. Westermann, “The Power of Reordering for Online Minimum Makespan Scheduling”, in Proc. IEEE FOCS’08, Philadelphia, Pennsylvania, USA, Oct. 2008.

    Google Scholar 

  8. B. Gufler, N. Augsten, A. Reiser, and A. Kemper, “Handling Data Skew In MapReduce”, in The First International Conference on Cloud Computing and Services Science, 2011.

    Google Scholar 

  9. B. Gufler, N. Augsten, A. Reiser, and A. Kemper, “Load Balancing in MapReduce Based on Scalable Cardinality Estimates”, in Proc. IEEE ICDE’12, Washington, DC, USA, Apr. 2012.

    Google Scholar 

  10. J. Kleinberg and E. Tardos, Algorithm Design, Pearson Education India, 2006.

    Google Scholar 

  11. Y. Kwon, M. Balazinska, B. Howe, and J. Rolia, “A study of skew in mapreduce applications”, in The 5th Open Cirrus Summit, 2011.

    Google Scholar 

  12. Y. Kwon, M. Balazinska, B. Howe, and J. Rolia, “SkewTune: Mitigating Skew in MapReduce Applications”, in Proc. ACM SIGMOD’12, Scottsdale, Arizona, USA, May. 2012.

    Google Scholar 

  13. W. Lang and J. Patel, “Energy management for MapReduce clusters”, in Proc. of the VLDB Endowment, Sept. 2010.

    Google Scholar 

  14. J. Leverich and C. Kozyrakis, “On the energy (in) efficiency of Hadoop clusters”, in ACM SIGOPS Operating Systems Review, Jan. 2010.

    Google Scholar 

  15. B. Li, E. Mazur, Y. Diao, A. McGregor, and P. Shenoy, “A platform for scalable one-pass analytics using MapReduce”, in Proc. ACM SIGMOD’11, Athens, Greece, June, 2011.

    Google Scholar 

  16. J. Lin, “The Curse of Zipf and Limits to Parallelization: A Look at the Stragglers Problem in MapReduce”, in The 7th Workshop on Large-Scale Distributed Systems for Information Retrieval, July. 2009.

    Google Scholar 

  17. G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, “Pregel: a system for large-scale graph processing”, in Proc. ACM SIGMOD’10, Indianapolis, Indiana, USA, June, 2010.

    Google Scholar 

  18. K. Morton, M. Balazinska, and D. Grossman, “ParaTimer: a progress indicator for MapReduce DAGs”, in Proc. ACM SIGMOD’10, Indianapolis, Indiana, USA, June. 2010.

    Google Scholar 

  19. S. Ramakrishnan, G. Swart, and A. Urmanov, “Balancing reducer skew in MapReduce workloads using progressive sampling”, in Proc. ACM SoCC’12, San Jose, California, USA, 2012.

    Google Scholar 

  20. M. Schatz, “CloudBurst: highly sensitive read mapping with MapReduce”, in Bioinformatics, vol. 25, no. 11, pp. 1363–1369, 2009.

    Google Scholar 

  21. J. Stamos and H. Young, “A symmetric fragment and replicate algorithm for distributed joins”, in IEEE Transactions on Parallel and Distributed Systems, 1993.

    Google Scholar 

  22. J. Tan, X. Meng, and L. Zhang, “Coupling task progress for MapReduce resource-aware scheduling”, in Proc. IEEE INFOCOM’13, Turin, Italy, Apr. 2013.

    Google Scholar 

  23. W. Yan and P. Larson, “Eager Aggregation and Lazy Aggregation”, in Proc. VLDB’95, Zurich, Switzerland, Sept. 1995.

    Google Scholar 

  24. H. Yang, et. al., “Cloud 9: A MapReduce library for Hadoop, available at http://lintool.github.io/Cloud9/

  25. H. Yang, A. Dasdan, R. Hsiao, and D. Parker, “Map-reduce-merge: simplified relational data processing on large clusters”, in Proc. ACM SIGMOD’07, Beijing, China, June, 2007.

    Google Scholar 

  26. M. Zaharia, A. Konwinski, A. Joseph, R. Katz, and I. Stoica, “Improving MapReduce Performance in Heterogeneous Environments”, in Proc. USENIX OSDI’08, Dec. 2008.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2015 The Author(s)

About this chapter

Cite this chapter

Wang, D., Han, Z. (2015). Application on Big Data Processing. In: Sublinear Algorithms for Big Data Applications. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-20448-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20448-2_4

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20447-5

  • Online ISBN: 978-3-319-20448-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics