Impala

Kornacker, Marcel; Behm, Alex

doi:10.1007/978-3-319-77525-8_253

Marcel Kornacker³ &
Alex Behm⁴

61 Accesses

Overview

Apache Impala is a modern MPP SQL engine architected specifically for the Hadoop data processing environment. Impala provides low latency and high concurrency for business intelligence/analytic workloads, which are not served well by batch-oriented processing frameworks such as MapReduce or Spark. Unlike traditional commercial analytic DBMSs, Impala is not a monolithic system but instead relies on the functionality of common, shared components of the Hadoop ecosystem to provide RDBMS-like functionality. In particular, Impala utilizes Hive’s Metastore as a system catalog, which it shares with other processing frameworks such as Hive and Spark. In addition, Impala does not provide persistence functionality itself; instead, it relies on storage management components of the Hadoop ecosystem (e.g., HDFS, Kudu), which it drives via their public APIs. These storage management components in effect implement the record service layer of a traditional RDBMS and define the degree to which...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 849.99; Price excludes VAT (USA)

Hardcover Book: USD 999.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Behm A (2015) New in Cloudera Enterprise 5.5: support for complex types in Impala. https://blog.cloudera.com/blog/2015/11/new-in-cloudera-enterprise-5-5-support-for-complex-types-in-impala/
Centralized cache management in HDFS. https://hadoop.apache.org/docs/r2.3.0/hadoopproject-dist/hadoop-hdfs/CentralizedCacheManagement.html
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th symposium on operating systems design and implementation
Google Scholar
Graefe G (1990) Encapsulation of parallelism in the Volcano query processing system. In: Proceedings of the 1990 ACM SIGMOD international conference on management of data
Google Scholar
HDFS short-circuit local reads. http://hadoop.apache.org/docs/r2.5.1/hadoop-projectdist/hadoop-hdfs/ShortCircuitLocalReads.html
Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the international symposium on code generation and optimization, 2004
Google Scholar
Melnik S, Gubarev A, Long JJ, Romer G, Shivakumar S, Tolton M, Vassilakis T (2010) Dremel: interactive analysis of web-scale datasets. Proc VLDB Endow 3(1):330–339
Article Google Scholar
Zahari M et al (2010) Spark: cluster computing with working sets. In: 2nd USENIX workshop on hot topics in cloud computing
Google Scholar

Download references

Author information

Authors and Affiliations

Blink Computing, San Francisco, CA, USA
Marcel Kornacker
Databricks, San Francisco, CA, USA
Alex Behm

Authors

Marcel Kornacker
View author publications
You can also search for this author in PubMed Google Scholar
Alex Behm
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, University of Tartu, Tartu, Estonia
Sherif Sakr
School of Information Technologies, Sydney University, Sydney, Australia
Albert Y. Zomaya

Section Editor information

IBM Almaden Research Center, SAN JOSE, CA, USA
Yuanyuan Tian
IBM Research – Almaden, San Jose, CA, USA
Fatma Özcan

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Kornacker, M., Behm, A. (2019). Impala. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_253

Download citation

DOI: https://doi.org/10.1007/978-3-319-77525-8_253
Published: 20 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77524-1
Online ISBN: 978-3-319-77525-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics