Skip to main content

Improving Hadoop Hive Query Response Times Through Efficient Virtual Resource Allocation

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 400))

Abstract

The performance of the MapReduce-based Cloud data warehouses mainly depends on the virtual hardware resources allocated. Most of the time, the resources are values selected/given by the Cloud service providers. However, setting the right virtual resources in accordance with the workload demands of a query, such as the number of CPUs, the size of RAM, and the network bandwidth, will improve the response time when querying large data on an optimized system. In this study, we carried out a set of experiments with a well-known Mapreduce SQL-translator, Hadoop Hive, on benchmark decision support the TPC benchmark (TPC-H) database in order to analyze the performance sensitivity of the queries under different virtual resource settings. Our results provide valuable hints for the decision makers who design efficient MapReduce-based data warehouses on the Cloud.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amazon Web Services (AWS). aws.amazon.com (last accessed September 5, 2014)

    Google Scholar 

  2. Google App Engine. http://code.google.com/appengine/ (last accessed September 5, 2014)

  3. Windows Azure Platform. microsoft.com/windowsazure/ (last accessed September 5)

    Google Scholar 

  4. Apache Hadoop. http://hadoop.apache.org/ (last accessed May 1, 2015)

  5. Kantere, V., Dash, D., Francois, G., Kyriakopoulou, S., Ailamaki, A.: Optimal service pricing for a cloud cache. IEEE Transactions on Knowledge and Data Engineering 23(9), 1345–1358 (2011)

    Article  Google Scholar 

  6. Kllapi, H., Sitaridi, E., Tsangaris, M.M., Ioannidis, Y.E.: Schedule optimization for data processing ows on the cloud. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 289–300 (2011)

    Google Scholar 

  7. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Warfield, A.: Xen and the art of virtualization. ACM SIGOPS Operating Systems Review 37(5), 164–177 (2003)

    Article  Google Scholar 

  8. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Murthy, R.: Hive-a petabyte scale data warehouse using hadoop. In: ICDE, pp. 996–1005 (2010)

    Google Scholar 

  9. Soror, A.A., Minhas, U.F., Aboulnaga, A., Salem, K., Kokosielis, P., Kamath, S.: Automatic virtual machine configuration for database workloads. ACM Transactions on Database Systems (TODS) 35(1), 7 (2010)

    Article  Google Scholar 

  10. Aboulnaga, A., Amza, C., Salem, K.: Virtualization and databases: state of the art and research challenges. In: Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology, pp. 746–747 (2008)

    Google Scholar 

  11. Dokeroglu, T., Ozal, S., Bayir, M.A., Cinar, M.S., Cosar, A.: Improving the performance of Hadoop Hive by sharing scan and computation tasks. Journal of Cloud Computing 3(1), 1–11 (2014)

    Article  Google Scholar 

  12. Dokeroglu, T., Sert, S.A., Cinar, M.S.: Evolutionary multiobjective query workload optimization of Cloud data warehouses. The Scientific World Journal (2014)

    Google Scholar 

  13. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  14. Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: Mapreduce online. In: Proc. of the 7th USENIX Conf. on Networked Systems Design and Implementation (2010)

    Google Scholar 

  15. Stonebraker, M., et al.: MapReduce and parallel DBMSs: friends or foes. Communications of the ACM 53(1), 64–71 (2010)

    Article  Google Scholar 

  16. Stonebraker, M., Aoki, P.M., Litwin, W., Pfeffer, A., Sah, A., Sidell, J., Sidell, J.: Mariposa: a wide-area distributed database system. The VLDB Journal 5(1), 48–63 (1996)

    Article  Google Scholar 

  17. Marbukh, V., Mills, K.: Demand pricing and resource allocation in market-based compute grids: a model and initial results. In: ICN 2008, pp. 752–757 (2008)

    Google Scholar 

  18. Moreno, R., Alonso-Conde, A.B.: Job scheduling and resource management techniques in economic grid environments. In: Fernández Rivera, F., Bubak, M., Gómez Tato, A., Doallo, R. (eds.) Across Grids 2003. LNCS, vol. 2970, pp. 25–32. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  19. Berriman, G.B., Juve, G., Deelman, E., Regelson, M., Plavchan, P.: The application of cloud computing to astronomy: a study of cost and performance. In: Sixth IEEE International Conference e-Science Workshops, pp. 1–7 (2010)

    Google Scholar 

  20. Tsakalozos, K., Kllapi, H., Sitaridi, E., Roussopoulos, M., Paparas, D., Delis, A.: Flexible use of cloud resources through profit maximization and price discrimination. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 75–86 (2011)

    Google Scholar 

  21. Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. of the VLDB 2(1), 922–933 (2009)

    Article  Google Scholar 

  22. Weikum, G., Moenkeberg, A., Hasse, C., Zabback, P.: Self-tuning database technology and information services: from wishful thinking to viable engineering. In: Proceedings of VLDB, pp. 20–31 (2002)

    Google Scholar 

  23. Agrawal, S., Chaudhuri, S., Das, A., Narasayya, V.: Automating layout of relational databases. In: ICDE, pp. 607–618 (2003)

    Google Scholar 

  24. Dash, D., Kantere, V., Ailamaki, A.: An economic model for self-tuned cloud caching. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 1687–1693 (2009)

    Google Scholar 

  25. Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, p. 50 (2008)

    Google Scholar 

  26. Hadoop Hive project. http://hadoop.apache.org/hive/ (last accessed May 1, 2015)

  27. Dai, W., Bassiouni, M.: An improved task assignment scheme for Hadoop running in the clouds. Journal of Cloud Computing: Advances, Systems and Applications 2(1), 1–16 (2013)

    Google Scholar 

  28. Chatziantoniou, D., Tzortzakakis, E.: Asset queries: a declarative alternative to mapreduce. ACM SIGMOD Record 38(2), 35–41 (2009)

    Article  Google Scholar 

  29. Mahboubi, H., Darmont, J.: Enhancing XML data warehouse query performance by fragmentation. In: Proceedings of ACM Symposium on Applied Computing, pp. 1555–1562 (2009)

    Google Scholar 

  30. Ordonez, C., Song, I.Y., Garcia-Alvarado, C.: Relational versus non-relational database systems for data warehousing. In: Proc. of the ACM 13th Int. Workshop on Data warehousing and OLAP, pp. 67–68 (2010)

    Google Scholar 

  31. Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Zaharia, M.: A view of cloud computing. Communications of the ACM 53(4), 50–58 (2010)

    Article  Google Scholar 

  32. Zhou, J., Larson, P.A., Elmongui, H.G.: Lazy maintenance of materialized views. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 231–242 (2007)

    Google Scholar 

  33. Storm, A.J., Garcia-Arellano, C., Lightstone, S.S., Diao, Y., Surendra, M.: Adaptive self-tuning memory in DB2. In: Proceedings of VLDB, pp. 1081–1092 (2006)

    Google Scholar 

  34. Running TPC-H queries on Hive. http://issues.apache.org/jira/browse/HIVE-600 (last accessed May 1, 2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tansel Dokeroglu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Dokeroglu, T., Cınar, M.S., Sert, S.A., Cosar, A., Yazıcı, A. (2016). Improving Hadoop Hive Query Response Times Through Efficient Virtual Resource Allocation. In: Andreasen, T., et al. Flexible Query Answering Systems 2015. Advances in Intelligent Systems and Computing, vol 400. Springer, Cham. https://doi.org/10.1007/978-3-319-26154-6_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26154-6_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26153-9

  • Online ISBN: 978-3-319-26154-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics