Getting Started with Hadoop

  • K. G. SrinivasaEmail author
  • Anil Kumar Muppalla
Part of the Computer Communications and Networks book series (CCN)


Apache Hadoop is a software framework that allows distributed processing of large datasets across clusters of computers using simple programming constructs/models. It is designed to scale-up from a single server to thousands of nodes. It is designed to detect failures at the application level rather than rely on hardware for high-availability thereby delivering a highly available service on top of cluster of commodity hardware nodes each of which is prone to failures [2]. While Hadoop can be run on a single machine the true power of Hadoop is realized in its ability to scale-up to thousands of computers, each with several processor cores. It also distributes large amounts of work across the clusters efficiently [1].


Data Block Master Node Replication Factor Slave Node Hadoop Distribute File System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Tom White, 2012, Hadoop: The Definitive Guide, O’reillyGoogle Scholar
  2. 2.
    Hadoop Tutorial, Yahoo Developer Network,
  3. 3.
    Mike Cafarella and Doug Cutting, April 2004, Building Nutch: Open Source Search, ACM Queue,
  4. 4.
    Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leun, g, October 2003, The Google File System,
  5. 5.
    Jeffrey Dean and Sanjay Ghemawat, December 2004, MapReduce: Simplified Data Processing on Large Clusters,
  6. 6.
    Yahoo! Launches World’s Largest Hadoop Production Application, 19 February 2008,
  7. 7.
    Derek Gottfrid, 1 November 2007, Self-service, Prorated Super Computing Fun!,
  8. 8.
    Google, 21 November 2008, Sorting 1PB with MapReduce,
  9. 9.
    From Gantz et al., March 2008, The Diverse and Exploding Digital Universe,
  10. 10.
  11. 11.
    David J. DeWitt and Michael Stonebraker, In January 2007 ?MapReduce: A major step backwards?
  12. 12.
    Jim Gray, March 2003, Distributed Computing Economics,
  13. 13.
  14. 14.
  15. 15.
    Jeffrey Dean and Sanjay Ghemawat, 2004, MapReduce: Simplified Data Processing on Large Clusters. Proc. Sixth Symposium on Operating System Design and Implementation.Google Scholar
  16. 16.
    Olston, Christopher, et al. ”Pig latin: a not-so-foreign language for data processing.” Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008.Google Scholar
  17. 17.
    Thusoo, Ashish, et al. ”Hive: a warehousing solution over a map-reduce framework.” Proceedings of the VLDB Endowment 2.2 (2009): 1626-1629.Google Scholar
  18. 18.
    George, Lars. HBase: the definitive guide. ” O’Reilly Media, Inc.”, 2011.Google Scholar
  19. 19.
    Hunt, Patrick, et al. ”ZooKeeper: Wait-free Coordination for Internet-scale Systems.” USENIX Annual Technical Conference. Vol. 8. 2010.Google Scholar
  20. 20.
    Hausenblas, Michael, and Jacques Nadeau. ”Apache drill: interactive Ad-Hoc analysis at scale.” Big Data 1.2 (2013): 100-104.Google Scholar
  21. 21.
    Borthakur, Dhruba. ”HDFS architecture guide.” HADOOP APACHE PROJECT (2008).
  22. 22.
  23. 23.
    Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler, May 2010, The Hadoop Distributed File System, Proceedings of MSST2010,
  24. 24.
    [Online] Konstantin V. Shvachko, April 2010, HDFS Scalability: The limits to growth, pp. 6–16
  25. 25.
  26. 26.
  27. 27.
  28. 28.
    Hadoop, Apache. ”Apache Hadoop.” 2012-03-07]. (2011).

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.M.S. Ramaiah Institute of TechnologyBangaloreIndia

Personalised recommendations