Analyzing Performance of Apache Pig and Apache Hive with Hadoop

Bansal, Krati; Chawla, Priyanka; Kurle, Pratik

doi:10.1007/978-981-13-1642-5_4

Krati Bansal³⁸,
Priyanka Chawla³⁹ &
Pratik Kurle⁴⁰

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 478))

1160 Accesses
7 Citations

Abstract

Big Data is the term used for huge datasets which are very complex in nature and difficult to be processed using traditional devices. The current requirement is for a new technology for analyzing these huge datasets. One of the best options is Apache Hadoop as it consists of various components which work simultaneously to provide an efficient and robust Hadoop ecosystem. Apache Pig and Apache Hive are core components of Hadoop ecosystem that facilitate specification and search of processing tasks. Apache Hive facilitates to run queries and manage huge datasets using simple commands similar to SQL. Apache Pig is a scripting platform which creates MapReduce programs utilized with Hadoop. In our previous work, we had analyzed and compared both these components to identify benefits and drawbacks on the basis of some parameters. We have showcased analysis of previously conducted research by various researchers. In this paper, we have carried out the analysis by utilizing both these components installed on Hadoop with large dataset as an input.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Pol, U.R.: Big data analysis: comparison of hadoop mapreduce, pig and hive. Int. J. Innov. Res. Sci. Eng. Technol. 5(6) (2016)
Google Scholar
Dave, K., Vania, J.: Survey on big data processing using hadoop component. IJSRD 3(01) (2015)
Google Scholar
Nawsher, I., Abaker, I., Hashem, T., Inayat, Z.: Big Data: Survey, Technologies, Opportunities, and Challenges, C Volume (2014)
Google Scholar
Kumar, S., Goel, E.: Comparative analysis of mapreduce, hive and pig. Int. J. Eng. Sci. 17 (2016)
Google Scholar
Laxmi Lydia, E., BenSwarup, M.: Analysis of big data through hadoop ecosysytem component like flume, hive, pig and mapreduce. Int. J. Comput. Sci. Eng. 5 (2016)
Google Scholar
Hansen, M.M., Miron-Shatz, T.: Big Data in Science and Health Care. IMIA and Schattauer Gmbh, IMIA Year Book of Medical Informatics (2014)
Google Scholar
Stella, C.: Apache pig for data science. In: Proceeding at Linuz Foundation, 9 April 2014
Google Scholar
Ouaknine, K., Carey, M., Kirkpatrick, S.: The pig mix benchmark on pig, map reduce, and HPCC system. In: IEEE International Congress on Big Data (2015); Ramsingh, J., Bhuvaneswari, V.: An insight on big data analytics using pig script. Int. J. Emer. Trends Technol. Comput. Sci. (IJETTCS) 4(6), 84–90 (2015)
Google Scholar
Dhawan, S., Rathee, S.: Big data analytics using hadoop component like hive and pig. Am. Int. J. Res. Sci. Technol. Eng. Math. 88–93 (2013)
Google Scholar
Mechine, J., Sriama, S.: Large Scale Data Analysis Using Apache Pig, Master Thesis, Tartu (2011)
Google Scholar
Jakobus, B., McBrien, P.: Pig vs Hive: Benchmarking High Level Query Languages, IBM
Google Scholar
Jalali, V., Leake, D.: Manual for bear big data ensemble of adaptations for regression version 1.0. General Public License Version 3, 5 Oct 2015
Google Scholar
EMC2 “Data Lake For Data Science” EMC White Paper, May 2015
Google Scholar
Kaisher, S., Frank Armour, J., Espinosa, A., Money, W.: Big data: issue and challenges moving forward. In: 46th Hawaii International Conference on System Science (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Indicosmic capital Pvt. Ltd., Mumbai, India
Krati Bansal
School of Computer Science and Engineering, Lovely Professional University, Phagwara, India
Priyanka Chawla
Mumbai University, Mumbai, Maharashtra, India
Pratik Kurle

Authors

Krati Bansal
View author publications
You can also search for this author in PubMed Google Scholar
Priyanka Chawla
View author publications
You can also search for this author in PubMed Google Scholar
Pratik Kurle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krati Bansal .

Editor information

Editors and Affiliations

Amity School of Applied Sciences, Amity University Rajasthan, Jaipur, Rajasthan, India
Kanad Ray
Department of Electrical Engineering, Manipal University Jaipur, Jaipur, Rajasthan, India
S. N. Sharan
Department of Electronics and Communication Engineering, Manipal University Jaipur, Jaipur, Rajasthan, India
Sanyog Rawat
Department of Physics, Manipal University Jaipur, Jaipur, Rajasthan, India
S. K. Jain
Department of Information Technology, Manipal University Jaipur, Jaipur, Rajasthan, India
Sumit Srivastava
ANCC, National Institute for Materials Science, Tsukuba, Ibaraki, Japan
Anirban Bandyopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bansal, K., Chawla, P., Kurle, P. (2019). Analyzing Performance of Apache Pig and Apache Hive with Hadoop. In: Ray, K., Sharan, S., Rawat, S., Jain, S., Srivastava, S., Bandyopadhyay, A. (eds) Engineering Vibration, Communication and Information Processing. Lecture Notes in Electrical Engineering, vol 478. Springer, Singapore. https://doi.org/10.1007/978-981-13-1642-5_4

Download citation

DOI: https://doi.org/10.1007/978-981-13-1642-5_4
Published: 31 October 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1641-8
Online ISBN: 978-981-13-1642-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics