Skip to main content

Performance Comparison Between Apache Hive and Oracle SQL for Big Data Analytics

  • Conference paper
  • First Online:
  • 1358 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 614))

Abstract

Big data shall mean the massive volume of data that could not be stored, processed and managed by any traditional database management systems. Big Data Analytics becoming a comprehensive research area today this has attracted to all academia and industry to extract knowledge and information from a large amount of data. Oracle SQL is a prominent DBMS and is used worldwide. As the data goes bigger the running time is increasing in Oracle SQL. With the help of Apache Hive, we can do a large scale of data analysis in minimal time period. Apache Hive expedites for reading, writing and managing big datasets in distributed environment using SQL. Whereas Oracle SQL provides integrated development domain for running queries and scripts. In this paper, we have taken few queries for analysis for some smaller data sets as well as larger data sets and we have done an analysis for both Apache Hive and Oracle SQL environment.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Chawda, R.K.: Big data and advanced analytics tools. In: Symposium on Colossal Data Analysis and Networking (CDAN) (2016)

    Google Scholar 

  2. Garg, V.: Optimization of multiple queries for big data with apache Hadoop/Hive. In: 2015 International Conference on Computational Intelligence and Communication Networks, pp. 938–941 (2015)

    Google Scholar 

  3. Gruenheid, A., Omiecinski, E., Mark, L.: Query optimization using column statistics in hive. In: Categories and Subject Descriptors (2016)

    Google Scholar 

  4. Haryono, G.P., Zhou, Y.: Profiling apache HIVE query from runtime logs. In: International Conference on Big Data Smart Computing BigComp, pp. 61–68 (2016)

    Google Scholar 

  5. Kaisler, S., Armour, F., Espinosa, J.A., Money, W.: Big data: issues and challenges moving forward. In: 2013 46th Hawaii International Conference on System Science, pp. 995–1004 (2013)

    Google Scholar 

  6. Sethy, R., Panda, M.: Big data analysis using hadoop: a survey. IJARCSSE 1153–1157 (2015)

    Google Scholar 

  7. Thusoo, A., Sen, S.J., Jain, N., Shao, Z., Chakka, P., Zhang, N., Antony, S., Liu, H., Murthy, R.: Hive - A petabyte scale data warehouse using Hadoop. In: Proceedings of the International Conference on Data Engineering, pp. 996–1005 (2010)

    Google Scholar 

  8. Loshin, D.: Big Data Tools and Techniques, pp. 61–72 (2013). Chapter 7

    Google Scholar 

  9. Hive Architecture. https://cwiki.apache.org/confluence/display/Hive/Design

  10. Introduction to Oracle Database. https://docs.oracle.com/database/121/CNCPT/intro.htm#CNCPT001

  11. Online Video Characteristics and Transcoding Time Dataset Data Set (2015). https://archive.ics.uci.edu/ml/datasets.html

  12. Record Linkage Comparison Patterns Data Set (2011). https://archive.ics.uci.edu/ml/datasets.html

  13. 3D Road Network (North Jutland, Denmark) Data Set (2013). https://archive.ics.uci.edu/ml/datasets.html

  14. Rate Data Set (2015). https://www.kaggle.com/hhsgov/health-insurance-marketplace

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rotsnarani Sethy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Sethy, R., Dash, S.K., Panda, M. (2018). Performance Comparison Between Apache Hive and Oracle SQL for Big Data Analytics. In: Abraham, A., Cherukuri, A., Madureira, A., Muda, A. (eds) Proceedings of the Eighth International Conference on Soft Computing and Pattern Recognition (SoCPaR 2016). SoCPaR 2016. Advances in Intelligent Systems and Computing, vol 614. Springer, Cham. https://doi.org/10.1007/978-3-319-60618-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60618-7_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60617-0

  • Online ISBN: 978-3-319-60618-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics