Skip to main content

Hadoop

  • Chapter
  • First Online:
Network Data Analytics

Part of the book series: Computer Communications and Networks ((CCN))

Abstract

Data analytics lifecycle phases involve many steps such as variable selection, developing hypothesis, perform analytics, and visualization. These phases were discussed in the previous section. In this chapter, one of the main tools/platforms used for data analytics Hadoop is presented. Hadoop is an open-source project under Apache foundation. It used for processing large volume of data in a distributed manner. It is based on the distributed processing and follows master–slave architecture. MapReduce programming model is used to access the files in Hadoop. In this chapter, Hadoop and MapReduce programming with examples are discussed. The highlights of the chapter include case studies like retail analytics and network log analytics in Hadoop with MapReduce.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 129.00
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cafarella, M., & Cutting, D. (2004, April). Building Nutch: Open source search. ACM Queue, http://queue.acm.org/detail.cfm?id=988408.

    Article  Google Scholar 

  2. Hadoop, A. (2009). Hadoop. 2009-03-06. http://hadoop.apache.org.

  3. Borthakur, D. (2007). The Hadoop distributed file system: Architecture and design. Retrieved from January 5, 2013.

    Google Scholar 

  4. Dean, J., & Ghemawat, S. (2010). MapReduce: A flexible data processing tool. Communications of the ACM, 53(1), 72–77. https://doi.org/10.1145/1629175.1629198.

    Article  Google Scholar 

  5. David, P. (2012). The big data hub: Understanding big data for the enterprise. Retrieved December 1, 2012, from http://www.ibmbigdatahub.com/blog/lords-datastorm-vestas-and-ibm-win-bigdata-award.

  6. Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clus-ters. Communications of the ACM, 51(1), 107–113. https://doi.org/10.1145/1327452.1327492.

    Article  Google Scholar 

  7. Condie, T., Conway, N., Alvaro, P., Hellerstein, J. M., Elmeleegy, K., & Sears, R. (2009). MapReduce online (Tech. Rep. UCB/EECS-2009-136). Berkeley, CA: University of California.

    Google Scholar 

  8. Hedlund, B. (2010). Understanding Hadoop clusters and the network. Studies in Data Center Networking, Virtualization, Computing.

    Google Scholar 

  9. White, T. (2012). Hadoop: The definitive guide. “O’Reilly Media, Inc.”. Bhandarkar, M. (2010, April). MapReduce programming with apache Hadoop. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), (pp. 1-1).

    Google Scholar 

  10. Xu, C. (2018). Big data analytic frameworks for GIS (Amazon EC2, Hadoop, Spark).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. G. Srinivasa .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Srinivasa, K.G., G. M., S., H., S. (2018). Hadoop. In: Network Data Analytics. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-77800-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-77800-6_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-77799-3

  • Online ISBN: 978-3-319-77800-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics