Abstract
Data analytics lifecycle phases involve many steps such as variable selection, developing hypothesis, perform analytics, and visualization. These phases were discussed in the previous section. In this chapter, one of the main tools/platforms used for data analytics Hadoop is presented. Hadoop is an open-source project under Apache foundation. It used for processing large volume of data in a distributed manner. It is based on the distributed processing and follows master–slave architecture. MapReduce programming model is used to access the files in Hadoop. In this chapter, Hadoop and MapReduce programming with examples are discussed. The highlights of the chapter include case studies like retail analytics and network log analytics in Hadoop with MapReduce.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cafarella, M., & Cutting, D. (2004, April). Building Nutch: Open source search. ACM Queue, http://queue.acm.org/detail.cfm?id=988408.
Hadoop, A. (2009). Hadoop. 2009-03-06. http://hadoop.apache.org.
Borthakur, D. (2007). The Hadoop distributed file system: Architecture and design. Retrieved from January 5, 2013.
Dean, J., & Ghemawat, S. (2010). MapReduce: A flexible data processing tool. Communications of the ACM, 53(1), 72–77. https://doi.org/10.1145/1629175.1629198.
David, P. (2012). The big data hub: Understanding big data for the enterprise. Retrieved December 1, 2012, from http://www.ibmbigdatahub.com/blog/lords-datastorm-vestas-and-ibm-win-bigdata-award.
Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clus-ters. Communications of the ACM, 51(1), 107–113. https://doi.org/10.1145/1327452.1327492.
Condie, T., Conway, N., Alvaro, P., Hellerstein, J. M., Elmeleegy, K., & Sears, R. (2009). MapReduce online (Tech. Rep. UCB/EECS-2009-136). Berkeley, CA: University of California.
Hedlund, B. (2010). Understanding Hadoop clusters and the network. Studies in Data Center Networking, Virtualization, Computing.
White, T. (2012). Hadoop: The definitive guide. “O’Reilly Media, Inc.”. Bhandarkar, M. (2010, April). MapReduce programming with apache Hadoop. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), (pp. 1-1).
Xu, C. (2018). Big data analytic frameworks for GIS (Amazon EC2, Hadoop, Spark).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Srinivasa, K.G., G. M., S., H., S. (2018). Hadoop. In: Network Data Analytics. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-77800-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-77800-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77799-3
Online ISBN: 978-3-319-77800-6
eBook Packages: Computer ScienceComputer Science (R0)