Query Optimization Challenges for SQL-on-Hadoop
In database management systems, the query optimizer is the component responsible for mapping an input query to the most efficient mechanism of executing the query, called query execution plan. Query execution is a resource-intensive operation that consumes memory, I/O, and network bandwidth resources of the underlying database management system. Query optimizer builds a space of plan alternatives capturing the different ways of executing an input query, such as different orderings of joins among the tables referenced by the query. Each plan alternative is assessed using a cost model that computes a cost estimate reflecting a prediction of the plan’s wall clock running time. The optimizer picks the most efficient execution plan according to such cost estimates.
The job of a query optimizer is to turn a user query into an efficient query execution plan. The optimizer typically generates the execution plan by considering a large space of possible alternative plans and...
- Antova L, El-Helw A, Soliman MA, Gu Z, Petropoulos M, Waas F (2014) Optimizing queries over partitioned tables in MPP systems. In: Proceedings of the 2014 ACM SIGMOD international conference on management of dataGoogle Scholar
- Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, Zaharia M (2015) Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD international conference on management of dataGoogle Scholar
- Apache Calcite (2018) https://calcite.apache.org
- Graefe G (1995) The cascades framework for query optimization. IEEE Data Eng Bull 18(3):19–29Google Scholar
- Kornacker M, Erickson J (2012) Cloudera impala: real-time queries in Apache Hadoop, for real. http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html
- Orca Open Source (2018) https://github.com/greenplum-db/gporca
- Pivotal (2018a) Greenplum database. http://greenplum.org/
- Pivotal (2018b) HAWQ. http://hawq.incubator.apache.org/
- Soliman MA, Antova L, Raghavan V, El-Helw A, Gu Z, Shen E, Caragea GC, Garcia-Alvarado C, Rahman F, Petropoulos M, Waas F, Narayanan S, Krikellas K, Baldwin R (2014) Orca: a modular query optimizer architecture for big data. In: Proceedings of the 2014 ACM SIGMOD international conference on management of dataGoogle Scholar