Query Optimization Challenges for SQL-on-Hadoop

Soliman, Mohamed A.

doi:10.1007/978-3-319-63962-8_323-1

Mohamed A. Soliman³

226 Accesses
1 Citations

Definition

In database management systems, the query optimizer is the component responsible for mapping an input query to the most efficient mechanism of executing the query, called query execution plan. Query execution is a resource-intensive operation that consumes memory, I/O, and network bandwidth resources of the underlying database management system. Query optimizer builds a space of plan alternatives capturing the different ways of executing an input query, such as different orderings of joins among the tables referenced by the query. Each plan alternative is assessed using a cost model that computes a cost estimate reflecting a prediction of the plan’s wall clock running time. The optimizer picks the most efficient execution plan according to such cost estimates.

Overview

The job of a query optimizer is to turn a user query into an efficient query execution plan. The optimizer typically generates the execution plan by considering a large space of possible alternative plans and...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Antova L, El-Helw A, Soliman MA, Gu Z, Petropoulos M, Waas F (2014) Optimizing queries over partitioned tables in MPP systems. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data
Google Scholar
Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, Zaharia M (2015) Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data
Google Scholar
Apache Calcite (2018) https://calcite.apache.org
El-Helw A, Raghavan V, Soliman MA, Caragea G, Gu Z, Petropoulos M (2015) Optimization of common table expressions in MPP database systems. Proc VLDB Endow 8:1704–1715
Article Google Scholar
Graefe G (1995) The cascades framework for query optimization. IEEE Data Eng Bull 18(3):19–29
Google Scholar
Kornacker M, Erickson J (2012) Cloudera impala: real-time queries in Apache Hadoop, for real. http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html
Orca Open Source (2018) https://github.com/greenplum-db/gporca
Pivotal (2018a) Greenplum database. http://greenplum.org/
Pivotal (2018b) HAWQ. http://hawq.incubator.apache.org/
Soliman MA, Antova L, Raghavan V, El-Helw A, Gu Z, Shen E, Caragea GC, Garcia-Alvarado C, Rahman F, Petropoulos M, Waas F, Narayanan S, Krikellas K, Baldwin R (2014) Orca: a modular query optimizer architecture for big data. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data
Google Scholar

Download references

Author information

Authors and Affiliations

Datometry, Inc., San Francisco, CA, USA
Mohamed A. Soliman

Authors

Mohamed A. Soliman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed A. Soliman .

Editor information

Editors and Affiliations

School of Comp. Sci. and Engineering, University of New South Wales School of Comp. Sci. and Engineering, Eveleigh, New South Wales, Australia
Sherif Sakr
Sch of Info Techno, Building J12, University of Sydney Sch of Info Techno, Building J12, Sydney, Australia
Albert Zomaya

Section Editor information

IBM Almaden Research Center, 650 Harry RD, 95120, SAN JOSE, CA, United States
Yuanyuan Tian
IBM Research – Almaden, San Jose, CA, USA
Fatma Özcan

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Soliman, M.A. (2018). Query Optimization Challenges for SQL-on-Hadoop. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_323-1

Download citation

DOI: https://doi.org/10.1007/978-3-319-63962-8_323-1
Published: 23 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics