Overview
Apache Impala is a modern MPP SQL engine architected specifically for the Hadoop data processing environment. Impala provides low latency and high concurrency for business intelligence/analytic workloads, which are not served well by batch-oriented processing frameworks such as MapReduce or Spark. Unlike traditional commercial analytic DBMSs, Impala is not a monolithic system but instead relies on the functionality of common, shared components of the Hadoop ecosystem to provide RDBMS-like functionality. In particular, Impala utilizes Hive’s Metastore as a system catalog, which it shares with other processing frameworks such as Hive and Spark. In addition, Impala does not provide persistence functionality itself; instead, it relies on storage management components of the Hadoop ecosystem (e.g., HDFS, Kudu), which it drives via their public APIs. These storage management components in effect implement the record service layer of a traditional RDBMS and define the degree to which...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Behm A (2015) New in Cloudera Enterprise 5.5: support for complex types in Impala. https://blog.cloudera.com/blog/2015/11/new-in-cloudera-enterprise-5-5-support-for-complex-types-in-impala/
Centralized cache management in HDFS. https://hadoop.apache.org/docs/r2.3.0/hadoopproject-dist/hadoop-hdfs/CentralizedCacheManagement.html
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th symposium on operating systems design and implementation
Graefe G (1990) Encapsulation of parallelism in the Volcano query processing system. In: Proceedings of the 1990 ACM SIGMOD international conference on management of data
HDFS short-circuit local reads. http://hadoop.apache.org/docs/r2.5.1/hadoop-projectdist/hadoop-hdfs/ShortCircuitLocalReads.html
Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the international symposium on code generation and optimization, 2004
Melnik S, Gubarev A, Long JJ, Romer G, Shivakumar S, Tolton M, Vassilakis T (2010) Dremel: interactive analysis of web-scale datasets. Proc VLDB Endow 3(1):330–339
Zahari M et al (2010) Spark: cluster computing with working sets. In: 2nd USENIX workshop on hot topics in cloud computing
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this entry
Cite this entry
Kornacker, M., Behm, A. (2019). Impala. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_253
Download citation
DOI: https://doi.org/10.1007/978-3-319-77525-8_253
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77524-1
Online ISBN: 978-3-319-77525-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering