Skip to main content

Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase

  • Chapter
  • First Online:
Data Science and Big Data Computing

Abstract

The Apache Hadoop is an open-source project which allows for the distributed processing of huge data sets across clusters of computers using simple programming models. It is designed to handle massive amounts of data and has the ability to store, analyze, and access large amounts of data quickly, across clusters of commodity hardware. Hadoop has several large-scale data processing tools and each has its own purpose. The Hadoop ecosystem has emerged as a cost-effective way of working with large data sets. It imposes a particular programming model, called MapReduce, for breaking up computation tasks into units that can be distributed around a cluster of commodity and server class hardware and thereby providing cost-effective horizontal scalability. This chapter provides the introductory material about the various Hadoop ecosystem tools and describes their usage with data analytics. Each tool has its own significance in its functions in data analytics environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Intel (2013) White paper: extract, transform and load Big data with Hadoop. Available at: hadoop.intel.com. Accessed 30 July 2015

  2. Ashish et al (2010) Hive – a Petabyte scale data warehouse using hadoop. IEEE International Conference on Data Engineering, November 2010

    Google Scholar 

  3. Edward C et al (2012) Programming Hive. O’Reilly Media Inc, Sebastopol

    Google Scholar 

  4. Apache (2014) Language manual. Available at: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Overview. Accessed 10 May 2014

  5. Tutorials point (2013) Hive partitioning. Available at: http://www.tutorialspoint.com/Hive/Hive_partitioning.html. Accessed 15 June 2014

  6. Rohit R (2014) Introduction to Hive’s partitioning. Available at: http://java.dzone.com/articles/introduction-Hives. Accessed 25 Jan 2015

  7. Peschka J (2013) Introduction to Hive partitioning. Available at: http://www.brentozar.com/archive/2013/03/introduction-to-hive-partitioning/. Accessed 8 Aug 2015

  8. Thrive school (2013) Available at: http://thriveschool.blogspot.in/2013/11/Hive-bucketed-tables-and-sampling.html. Accessed 10 Jan 2015

  9. Philip N (2014) 10 best practices for Apache Hive.Available at: www.qubole.com/blog/big-data/hive-best-practices. Accessed 15 July 2015

  10. Petit W (2014) Introduction to Pig. Available at: http://bigdatauniversity.com/bdu-wp/bdu-course/introduction-to-pig/#sthash.HUcw7EZe.dpuf. Accessed 20 June 2014

  11. Apache (2014) Hadoop online tutorial. Available at: http://hadooptutorial.info/tag/hadoop-pig-architecture-explanation. Accessed 13 Feb 2015

  12. Hadoop (2010) Pig latin manual. Available at: https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html. Accessed 20 May 2014

  13. Lam C (2010) Hadoop in action. Manning Publications, Greenwich

    Google Scholar 

  14. Gates A (2011) Programming Pig. O’Reilly Media Inc, Sebastopol

    Google Scholar 

  15. Apache (2007) Getting started-Pig, Apache Software Foundation

    Google Scholar 

  16. Apache (2015) When would I use Apache HBase. Available at: Hbase.apache.org. Accessed 10 Feb 2015

  17. Grehan R (2014) Review: HBase is massively scalable – and hugely complex. Available at: http://www.infoworld.com/article/2610709/database/review--hbase-is-massively-scalable----and-hugely-complex.html. Accessed 10 July 2015

  18. Servelets C (2012) HBase overview. Available at: www.coreservlets.com/hadoop-tutorial/#HBase. Accessed 12 Jan. 2015

  19. Apache (2010) Apache Zookeeper. Available at: zookeeper.apache.org. Accessed 15 March 2015

  20. Tutorials Point (2014) HBase tutorial. Available at: Tutorialspoint.com/hbase. Accessed 18 Feb 2015

  21. George L (2011) HBase definitive guide. O’Reilly Media Inc, Sebastopol

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Maheswari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Maheswari, N., Sivagami, M. (2016). Large-Scale Data Analytics Tools: Apache Hive, Pig, and HBase. In: Mahmood, Z. (eds) Data Science and Big Data Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-31861-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31861-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31859-2

  • Online ISBN: 978-3-319-31861-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics