Skip to main content

MapReduce Performance in Federated Cloud Computing Environments

  • Chapter
  • First Online:
Book cover High Performance Cloud Auditing and Applications

Abstract

Large scale scientific and engineering applications, and cloud auditing generate huge amounts of data. MapReduce framework coupled with cloud computing is emerging as the viable solution for distributed big data processing. Specifically, if data is generated from distributed sources and computation is also distributed then multiple clouds need to be set up to minimize data transfer, which introduces us to federated distributed or multi-domain clouds. In addition to security concerns of general clouds, distributed clouds expose new challenges to the performance of cloud based applications including cloud auditing and analysis. This book chapter focuses on a method to deploy distributed clouds and evaluates the performance of various cloud based applications over distributed clouds. It also proposes a method to optimize the performance of cloud based applications over high speed networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmad, F., Chakradhar, S.T., Raghunathan, A., Vijaykumar, T.N.: Tarazu: Optimizing MapReduce on heterogeneous clusters. In: Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS’12, London, pp. 61–74. ACM, New York (2012). doi:10.1145/2150976. 2150984

    Google Scholar 

  2. apache.org, Apache Hadoop. http://goo.gl/tnkf

  3. Babu, S.: Towards automatic optimization of mapreduce programs. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC’10, Indianapolis, pp. 137–142. ACM, New York (2010). doi:10.1145/1807128. 1807150

    Google Scholar 

  4. Bajda-Pawlikowski, K., Abadi, D.J., Silberschatz, A., Paulson, E.: Efficient processing of data warehousing queries in a split execution environment. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD’11, Athens, pp. 1165–1176. ACM, New York (2011). doi:10.1145/1989323.1989447

    Google Scholar 

  5. Cardosa, M., Wang, C., Nangia, A., Chandra, A., Weissman, J.: Exploring mapreduce efficiency with highly-distributed data. In: Proceedings of the 2nd International Workshop on MapReduce and its Applications, MapReduce’11, San Jose, pp. 27–34. ACM, New York (2011). doi:10.1145/1996092.1996100

    Google Scholar 

  6. cloudera.com, Hadoop resources. http://goo.gl/uW8nr

  7. cron.loni.org, CRON project: Cyberinfrastructure for reconfigurable optical networking environment. http://goo.gl/yNRrn (2011)

  8. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). doi:10.1145/1327452. 1327492

    Article  Google Scholar 

  9. Dou, A., Kalogeraki, V., Gunopulos, D., Mielikainen, T., Tuulos, V.H.: Misco: a MapReduce framework for mobile systems. In: Proceedings of the 3rd International Conference on PErvasive Technologies Related to Assistive Environments, PETRA’10, Samos, pp. 32:1–32:8. ACM, New York (2010). doi:10.1145/1839294.1839332

    Google Scholar 

  10. Eastman, C.M., Jansen, B.J.: Coverage, relevance, and ranking: the impact of query operators on Web search engine results. ACM Trans. Inf. Syst. 21(4), 383–411 (2003). doi:10.1145/944012.944015

    Article  Google Scholar 

  11. eucalyptus.com, Open source software for building AWS-compatible private and hybrid clouds. http://goo.gl/FpdG5

  12. geni-orca.renci.org, NEuca patch for Eucalyptus cloud computing software. http://goo.gl/UbQDG

  13. GonzáLez-VéLez, H., Kontagora, M.: Performance evaluation of MapReduce using full virtualisation on a departmental cloud. Int. J. Appl. Math. Comput. Sci. 21(2), 275–284 (2011). doi:10.2478/ v10006-011-0020-3

    Article  MATH  Google Scholar 

  14. Guo, Z., Fox, G., Zhou, M.: Investigation of data locality and fairness in mapreduce. In: Proceedings of 3rd International Workshop on MapReduce and its Applications, MapReduce’12, Ottawa, pp. 25–32. ACM, New York (2012). doi:10.1145/2287016.2287022

    Google Scholar 

  15. Guo, Z., Fox, G., Zhou, M.: Investigation of data locality in MapReduce. In: Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID’12, Delft, pp. 419–426. IEEE Computer Society, Washington, DC (2012). doi:10.1109/CCGrid. 2012.42

    Google Scholar 

  16. Horey, J.: A programming framework for integrating web-based spatiotemporal sensor data with mapreduce capabilities. In: Proceedings of the ACM SIGSPATIAL International Workshop on GeoStreaming, IWGS’10, San Jose, pp. 51–58. ACM, New York (2010). doi:10.1145/ 1878500.1878511

    Google Scholar 

  17. ibm.com, IBM SmartCloud services. http://goo.gl/BMhy2

  18. iet.unipi.it, The dummynet project. http://goo.gl/smxg5

  19. Jahani, E., Cafarella, M.J., Ré, C.: Automatic optimization for MapReduce programs. Proc. VLDB Endow. 4(6), 385–396 (2011)

    Google Scholar 

  20. Jiang, D., Ooi, B.C., Shi, L., Wu, S.: The performance of MapReduce: an in-depth study. Proc. VLDB Endow. 3(1–2), 472–483 (2010)

    Google Scholar 

  21. Karloff, H., Suri, S., Vassilvitskii, S.: A model of computation for MapReduce. In: Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’10, Austin, pp. 938–948. Society for Industrial and Applied Mathematics, Philadelphia (2010)

    Google Scholar 

  22. Kondikoppa, P., Chiu, C.H., Cui, C., Xue, L., Park, S.J.: Network-aware scheduling of MapReduce framework ondistributed clusters over high speed networks. In: Proceedings of the 2012 Workshop on Cloud Services, Federation, and the 8th Open Cirrus Summit, FederatedClouds’12, San Jose, pp. 39–44. ACM, New York (2012). doi:10.1145/2378975.2378985

    Google Scholar 

  23. Lämmel, R.: Google’s MapReduce programming model – Revisited. Sci. Comput. Program. 70(1), 1–30 (2008). doi:10.1016/j.scico.2007.07.001

    Article  MATH  Google Scholar 

  24. Li, M., Subhraveti, D., Butt, A.R., Khasymski, A., Sarkar, P.: CAM: a topology aware minimum cost flow based resource manager for MapReduce applications in the cloud. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC’12, Delft, pp. 211–222. ACM, New York (2012). doi:10.1145/2287076.2287110

    Google Scholar 

  25. Lin, M.Y., Lee, P.Y., Hsueh, S.C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, ICUIMC’12, Kuala Lumpur, pp. 76:1–76:8. ACM, New York (2012). doi:10.1145/2184751.2184842

    Google Scholar 

  26. Liu, J., Ravi, N., Chakradhar, S., Kandemir, M.: Panacea: towards holistic optimization of MapReduce applications. In: Proceedings of the 10th International Symposium on Code Generation and Optimization, CGO’12, San Jose, pp. 33–43. ACM, New York (2012). doi:10.1145/ 2259016.2259022

    Google Scholar 

  27. Luo, Y., Guo, Z., Sun, Y., Plale, B., Qiu, J., Li, W.W.: A hierarchical framework for cross-domain MapReduce execution. In: Proceedings of the 2nd International Workshop on Emerging Computational Methods for the Life Sciences, ECMLS’11, San Jose, pp. 15–22. ACM, New York (2011). doi:10.1145/1996023.1996026

    Google Scholar 

  28. Luo, Y., Plale, B.: Hierarchical mapreduce programming model and scheduling algorithms. In: Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID’12, Ottawa, pp. 769–774. IEEE Computer Society, Washington, DC (2012). doi:10.1109/CCGrid.2012.132

    Google Scholar 

  29. Mandal, A., Xin, Y., Baldine, I., Ruth, P., Heerman, C., Chase, J., Orlikowski, V., Yumerefendi, A.: Provisioning and evaluating multi-domain networked clouds for Hadoop-based applications. In: Proceedings of the 3rd IEEE International Conference on Cloud Computing Technology and Science, CLOUDCOM’11, Athens, pp. 690–697. IEEE Computer Society, Washington, DC (2011). doi:10.1109/ CloudCom.2011.107

    Google Scholar 

  30. Mantha, P.K., Luckow, A., Jha, S.: Pilot-mapreduce: an extensible and flexible mapreduce implementation for distributed data. In: Proceedings of 3rd International Workshop on MapReduce and its Applications Date, MapReduce’12, Delft, pp. 17–24. ACM, New York (2012). doi:10.1145/ 2287016.2287020

    Google Scholar 

  31. Molina-Estolano, E., Gokhale, M., Maltzahn, C., May, J., Bent, J., Brandt, S.: Mixing Hadoop and HPC workloads on parallel filesystems. In: Proceedings of the 4th Annual Workshop on Petascale Data Storage, PDSW’09, Portland, pp. 1–5. ACM, New York (2009). doi:10.1145/ 1713072.1713074

    Google Scholar 

  32. Noll, M.G.: michael-noll.com, Benchmarking and stress testing an Hadoop cluster with TeraSort, TestDFSIO. http://goo.gl/zhxD5 (2011)

  33. OpenStack: Open source software for building private and public clouds. http://goo.gl/sWpx

  34. Palanisamy, B., Singh, A., Liu, L., Jain, B.: Purlieus: locality-aware resource allocation for mapreduce in a cloud. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC’11, Seattle, pp. 58:1–58:11. ACM, New York (2011). doi:10.1145/2063384.2063462

    Google Scholar 

  35. rackspace.com, The Rackspace cloud. http://goo.gl/5b2iU

  36. Sandholm, T., Lai, K.: MapReduce optimization using regulated dynamic prioritization. In: Proceedings of the 11th International Joint Conference on Measurement and Modeling of Computer Systems, SIGMETRICS’09, Seattle, pp. 299–310. ACM, New York (2009). doi:10. 1145/1555349.1555384

    Google Scholar 

  37. Schatz, M.C.: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11), 1363–1369 (2009). doi:10.1093/ bioinformatics/btp236

    Article  Google Scholar 

  38. scripps.edu, Auto dock vina from scripps lab. http://goo.gl/JM2Pw

  39. Tan, J., Meng, X., Zhang, L.: Delay tails in MapReduce scheduling. SIGMETRICS Perform. Evaluation Rev. 40(1), 5–16 (2012). doi:10. 1145/2318857.2254761

    Google Scholar 

  40. Vaquero, L.M., Rodero-Merino, L., Buyya, R.: Dynamically scaling applications in the cloud. SIGCOMM Comput. Commun. Rev. 41(1), 45–52 (2011). doi:10.1145/1925861.1925869

    Article  Google Scholar 

  41. Verma, A., Cherkasova, L., Campbell, R.H.: Resource provisioning framework for MapReduce jobs with performance goals. In: Proceedings of the 12th International Middleware Conference, Middleware’11, Lisbon, pp. 160–179. International Federation for Information Processing, Laxenburg (2011)

    Google Scholar 

  42. Wang, G., Butt, A.R., Pandey, P., Gupta, K.: Using realistic simulation for performance analysis of mapreduce setups. In: Proceedings of the 1st ACM Workshop on Large-Scale System and Application Performance, LSAP’09, Garching, pp. 19–26. ACM, New York (2009). doi:10.1145/ 1552272.1552278

    Google Scholar 

  43. Xu, H., Li, Z., Guo, S., Chen, K.: CloudVista: interactive and economical visual cluster analysis for big data in the cloud. Proc. VLDB Endow. (PVLDB) 5(12), 1886–1889 (2012)

    Google Scholar 

  44. Zhang, J., Li, T., Pan, Y.: Parallel rough set based knowledge acquisition using mapreduce from big data. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine ’12, Beijing, pp. 20–27. ACM, New York (2012). doi:10.1145/2351316. 2351320

    Google Scholar 

  45. Zhou, C.: Fast parallelization of differential evolution algorithm using MapReduce. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO’10, Portland, pp. 1113–1114. ACM, New York (2010). doi:10.1145/1830483.1830689

    Google Scholar 

Download references

Acknowledgements

This material is based upon work partially supported by the National Science Foundation (NSF) GENI-grant and grant MRI-0821741 (CRON project), the Department of Defense Experimental Program to Stimulate Competitive Research (DEPSCoR) N0014-08-1-0856, and the Air Force Research Laboratory (AFRL) Visiting Faculty Research Program (VFRP) extension grant LRIR 11RI01COR. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agency.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Praveenkumar Kondikoppa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kondikoppa, P., Chiu, CH., Park, SJ. (2014). MapReduce Performance in Federated Cloud Computing Environments. In: Han, K., Choi, BY., Song, S. (eds) High Performance Cloud Auditing and Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3296-8_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-3296-8_12

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-3295-1

  • Online ISBN: 978-1-4614-3296-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics