Advertisement

Performance Analysis of Queries with Hive Optimized Data Models

  • Meghna SharmaEmail author
  • Jagdeep Kaur
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 597)

Abstract

The processing of structured data in Hadoop is achieved by Hive, a data warehouse tool. It is present on top of Hadoop and helps to analyze, query, and review the Big Data. The execution time of the queries has drastically reduced by using Hadoop MapReduce. This paper presents the detailed comparison of various optimizing techniques for data models like partitioning and bucket methods to improve the processing time for Hive queries. The implementation is done on data from New York Police Portal using AWS services for storage. Hive tool in Hadoop ecosystem is used for querying data. Use of partitioning has shown remarkable improvement in terms of execution time.

Keywords

Big Data Hadoop Hive Partitioning Bucket methods 

References

  1. 1.
    Pen, H.D., Dsilva, P., Mascarnes, S.: Comparing HiveQL and MapReduce methods to process fact data in a data warehouse. In: 2017 2nd International Conference on Communication Systems, Computing and IT Applications (CSCITA), pp. 201–206. IEEE (2017, April)Google Scholar
  2. 2.
    Oussous, A., Benjelloun, F.Z., Lahcen, A.A., Belfkih, S.: Big data technologies: a survey. J. King Saud Univ. Comput. Inf. Sci. (2017)Google Scholar
  3. 3.
    Shaw, S., Vermeulen, A.F., Gupta, A., Kjerrumgaard, D.: Hive architecture. In: Practical Hive, pp. 37–48. Apress, Berkeley, CA (2016)Google Scholar
  4. 4.
    Sakr, S.: Big data 2.0 processing systems: a survey. Springer International Publishing (2016)Google Scholar
  5. 5.
    Bansal, H., Chauhan, S., Mehrotra, S.: Apache Hive Cookbook. Packt Publishing Ltd (2016)Google Scholar
  6. 6.
    Loganathan, A., Sinha, A., Muthuramakrishnan, V., Natarajan, S.: A systematic approach to big data. Int. J. Inf. Comput. Technol. 4(9), 869–878 (2014)Google Scholar
  7. 7.
    Zikopoulos, P., Eaton, C.: Understanding big data: analytics for enterprise class Hadoop and streaming data. McGraw-Hill Osborne Media (2010)Google Scholar
  8. 8.
    Usha, D., Jenil, A.: A survey of big data processing in perspective of Hadoop and MapReduce. Int. J. Curr. Eng. Technol. 4(2), 602–606 (2014)Google Scholar
  9. 9.
    Elgazzar, K., Martin, P., Hassanein, H.S.: Cloud-assisted computation offloading to support mobile services. IEEE Trans. Cloud Comput. 4(3), 279–292 (2016)CrossRefGoogle Scholar
  10. 10.
    Coronel, C., Morris, S.: Database systems: design, implementation, & management. Cengage Learning (2016)Google Scholar
  11. 11.
    Lydia, E.L., Swarup, M.B.: Big data analysis using Hadoop components like Flume, MapReduce, Pig and Hive. Int. J. Sci. Eng. Comput. Technol. 5(11), 390 (2015)Google Scholar
  12. 12.
    Vohra, D.: Using Apache Sqoop. In: Pro Docker, pp. 151–183. Apress, Berkeley, CA (2016)Google Scholar
  13. 13.
    Hoffman, S.: Apache Flume: Distributed Log Collection for Hadoop. Packt Publishing Ltd (2015)Google Scholar
  14. 14.
    Shireesha, R., Bhutada, S.: A study of tools, techniques, and trends for big data analytics. IJACTA 4(1), 152–158 (2015)Google Scholar
  15. 15.
    Mazumder, S.: Big data tools and platforms. In Big Data Concepts, Theories, and Applications, pp. 29–128. Springer, Cham (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.The NorthCap UniversityGurugramIndia

Personalised recommendations