Skip to main content

A Comprehensive Survey on Data-Intensive Computing and MapReduce Paradigm in Cloud Computing Environments

  • Conference paper
  • First Online:
Book cover Informatics and Communication Technologies for Societal Development

Abstract

With the emergence of cloud computing and other web technologies, data storage has become very easy and cheap. Hence, petabytes and zettabytes of data are in cloud storage on a daily basis through various portals such as YouTube and Facebook. This causes the need for big data analysis and data-intensive computing. In this paper, we perform a comprehensive survey on various data-intensive computing techniques and MapReduce paradigm mechanisms with cloud computing. We first provide data-intensive computing methodologies and then describe MapReduce algorithm for data-intensive computing on big data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shvachko, Hairong Kuang, Radia S., Chansler R.: The Hadoop Distributed file system. In: IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), (2010)

    Google Scholar 

  2. Moretti, C., Bulosan, J., Thain, D., Flynn, P.J.: All-pairs: an abstraction for data intensive cloud computing. In: IPDPS, April 2008

    Google Scholar 

  3. Grossman, R., Gu, Y.: Data mining using high performance data clouds: experimental studies using sector and sphere. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD-08, ACM, New York, NY, USA (2008)

    Google Scholar 

  4. Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P.K., Currey, J.: Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI-08, USENIX Association, Berkeley, CA, USA (2008)

    Google Scholar 

  5. Jianwu Wang, Crawl, D., Altintas, I.: Kepler+Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems. In: WORKS 09, Portland Oregon, 15 Nov 2009

    Google Scholar 

  6. Dean, J., Ghemawat S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51, (2008)

    Google Scholar 

  7. Chen, R., Chen, H., Zang, B.: Tiled-mapreduce: optimizing resource usages of data-parallel applications on multicore with tiling. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT10, ACM, New York, NY, USA (2010)

    Google Scholar 

  8. Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G.R., Kozyrakis, C.: Evaluating mapreduce for multi-core and multiprocessor systems. In: 13st International Conference on High-Performance Computer Architecture (2007)

    Google Scholar 

  9. Rafique, M.M., Rose, B., Butt, A.R., Nikolopoulos, D.S.: Supporting mapreduce on large-scale asymmetric multi-core clusters. SIGOPS Operat. Syst. Rev. 43, 25–34 (2009)

    Article  Google Scholar 

  10. Ibrahim, S., Jin, H., Cheng, B., Cao, H., Wu, S., Qi, L.: Cloudlet: towards mapreduce implementation on virtual machines. In: Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, HPDC 09, ACM, New York, NY, USA (2009).

    Google Scholar 

  11. Jin, C., Buyya, R.: Mapreduce programming model for.net-based cloud computing. In: Euro-Par. Lecture Notes in Computer Science, vol. 5704. Springer (2009)

    Google Scholar 

  12. Miceli, C., Miceli, M., Jha, S., Kaiser, H., Merzky, A.: Programming abstractions for data intensive computing on clouds and grids. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGRID09, IEEE Computer Society, Washington, DC, USA (2009)

    Google Scholar 

  13. Dou, A., Kalogeraki, V., Gunopulos, D., Mielikainen, T., Tuulos, V.H.: Misco: a mapreduce framework for mobile systems. In: Proceedings of the 3rd International Conference on PErvasive Technologies Related to Assistive Environments, PETRA10, ACM, New York, NY, USA (2010).

    Google Scholar 

  14. Lizhe Wanga, Jie Tao, Rajiv Ranjan, Holger Marten, Achim Streit, Jingying Chen, Dan Chena: G-Hadoop, MapReduce across distributed data centers for data intensive computing. Future Gener. Comput. Syst. 29, (2013)

    Google Scholar 

  15. Xia Wei-Lei, Wang Li-Song: Research on and implementation of parallel ant colony algorithm based on MapReduce. Electron. Sci. Technol. 26(2), 146 (2013)

    Google Scholar 

  16. Kiran, M., Kumar, A., Mukherjee, S., Ravi Prakash, G.: Program model for parallel support vector machine algorithm on hadoop cluster. Int. J. Comput. Sci. Iss. 10(3), 317–325 (2013). no. 1, (May 2013)

    Google Scholar 

Download references

Acknowledgment

The authors would like to acknowledge Dr. Ganesh Neelakanta Iyer for his valuable comments in completing this research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Girish Neelakanta Iyer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer India

About this paper

Cite this paper

Iyer, G.N., Silas, S. (2015). A Comprehensive Survey on Data-Intensive Computing and MapReduce Paradigm in Cloud Computing Environments. In: Rajsingh, E., Bhojan, A., Peter, J. (eds) Informatics and Communication Technologies for Societal Development. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1916-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-1916-3_9

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-1915-6

  • Online ISBN: 978-81-322-1916-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics