Definition
In database management systems, the query optimizer is the component responsible for mapping an input query to the most efficient mechanism of executing the query, called query execution plan. Query execution is a resource-intensive operation that consumes memory, I/O, and network bandwidth resources of the underlying database management system. Query optimizer builds a space of plan alternatives capturing the different ways of executing an input query, such as different orderings of joins among the tables referenced by the query. Each plan alternative is assessed using a cost model that computes a cost estimate reflecting a prediction of the plan’s wall clock running time. The optimizer picks the most efficient execution plan according to such cost estimates.
Overview
The job of a query optimizer is to turn a user query into an efficient query execution plan. The optimizer typically generates the execution plan by considering a large space of possible alternative plans and...
References
Antova L, El-Helw A, Soliman MA, Gu Z, Petropoulos M, Waas F (2014) Optimizing queries over partitioned tables in MPP systems. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data
Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, Zaharia M (2015) Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data
Apache Calcite (2018) https://calcite.apache.org
El-Helw A, Raghavan V, Soliman MA, Caragea G, Gu Z, Petropoulos M (2015) Optimization of common table expressions in MPP database systems. Proc VLDB Endow 8:1704–1715
Graefe G (1995) The cascades framework for query optimization. IEEE Data Eng Bull 18(3):19–29
Kornacker M, Erickson J (2012) Cloudera impala: real-time queries in Apache Hadoop, for real. http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html
Orca Open Source (2018) https://github.com/greenplum-db/gporca
Pivotal (2018a) Greenplum database. http://greenplum.org/
Pivotal (2018b) HAWQ. http://hawq.incubator.apache.org/
Soliman MA, Antova L, Raghavan V, El-Helw A, Gu Z, Shen E, Caragea GC, Garcia-Alvarado C, Rahman F, Petropoulos M, Waas F, Narayanan S, Krikellas K, Baldwin R (2014) Orca: a modular query optimizer architecture for big data. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this entry
Cite this entry
Soliman, M.A. (2018). Query Optimization Challenges for SQL-on-Hadoop. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_323-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-63962-8_323-1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering