Skip to main content

Query Optimization Challenges for SQL-on-Hadoop

  • Living reference work entry
  • First Online:
Encyclopedia of Big Data Technologies

Definition

In database management systems, the query optimizer is the component responsible for mapping an input query to the most efficient mechanism of executing the query, called query execution plan. Query execution is a resource-intensive operation that consumes memory, I/O, and network bandwidth resources of the underlying database management system. Query optimizer builds a space of plan alternatives capturing the different ways of executing an input query, such as different orderings of joins among the tables referenced by the query. Each plan alternative is assessed using a cost model that computes a cost estimate reflecting a prediction of the plan’s wall clock running time. The optimizer picks the most efficient execution plan according to such cost estimates.

Overview

The job of a query optimizer is to turn a user query into an efficient query execution plan. The optimizer typically generates the execution plan by considering a large space of possible alternative plans and...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Antova L, El-Helw A, Soliman MA, Gu Z, Petropoulos M, Waas F (2014) Optimizing queries over partitioned tables in MPP systems. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data

    Google Scholar 

  • Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, Zaharia M (2015) Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data

    Google Scholar 

  • Apache Calcite (2018) https://calcite.apache.org

  • El-Helw A, Raghavan V, Soliman MA, Caragea G, Gu Z, Petropoulos M (2015) Optimization of common table expressions in MPP database systems. Proc VLDB Endow 8:1704–1715

    Article  Google Scholar 

  • Graefe G (1995) The cascades framework for query optimization. IEEE Data Eng Bull 18(3):19–29

    Google Scholar 

  • Kornacker M, Erickson J (2012) Cloudera impala: real-time queries in Apache Hadoop, for real. http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html

  • Orca Open Source (2018) https://github.com/greenplum-db/gporca

  • Pivotal (2018a) Greenplum database. http://greenplum.org/

  • Pivotal (2018b) HAWQ. http://hawq.incubator.apache.org/

  • Soliman MA, Antova L, Raghavan V, El-Helw A, Gu Z, Shen E, Caragea GC, Garcia-Alvarado C, Rahman F, Petropoulos M, Waas F, Narayanan S, Krikellas K, Baldwin R (2014) Orca: a modular query optimizer architecture for big data. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed A. Soliman .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Soliman, M.A. (2018). Query Optimization Challenges for SQL-on-Hadoop. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_323-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_323-1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics