Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya


  • Marcel Kornacker
  • Alex Behm
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_253-1


Apache Impala is a modern MPP SQL engine architected specifically for the Hadoop data processing environment. Impala provides low latency and high concurrency for business intelligence/analytic workloads, which are not served well by batch-oriented processing frameworks such as MapReduce or Spark. Unlike traditional commercial analytic DBMSs, Impala is not a monolithic system but instead relies on the functionality of common, shared components of the Hadoop ecosystem to provide RDBMS-like functionality. In particular, Impala utilizes Hive’s Metastore as a system catalog, which it shares with other processing frameworks such as Hive and Spark. In addition, Impala does not provide persistence functionality itself; instead, it relies on storage management components of the Hadoop ecosystem (e.g., HDFS, Kudu), which it drives via their public APIs. These storage management components in effect implement the record service layer of a traditional RDBMS and define the degree to which...


Impala Azure Data Lake Store Hive Metastore (HMS) Physical Schema Design Query Coordinator 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in to check access.


  1. Behm A (2015) New in Cloudera Enterprise 5.5: support for complex types in Impala. https://blog.cloudera.com/blog/2015/11/new-in-cloudera-enterprise-5-5-support-for-complex-types-in-impala/
  2. Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th symposium on operating systems design and implementationGoogle Scholar
  3. Graefe G (1990) Encapsulation of parallelism in the Volcano query processing system. In: Proceedings of the 1990 ACM SIGMOD international conference on management of dataGoogle Scholar
  4. Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the international symposium on code generation and optimization, 2004Google Scholar
  5. Melnik S, Gubarev A, Long JJ, Romer G, Shivakumar S, Tolton M, Vassilakis T (2010) Dremel: interactive analysis of web-scale datasets. Proc VLDB Endow 3(1):330–339Google Scholar
  6. Zahari M et al (2010) Spark: cluster computing with working sets. In: 2nd USENIX workshop on hot topics in cloud computingGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Marcel Kornacker
    • 1
  • Alex Behm
    • 2
  1. 1.Blink ComputingSan FranciscoUSA
  2. 2.DatabricksSan FranciscoUSA

Section editors and affiliations

  • Yuanyuan Tian
    • 1
  • Fatma Özcan
    • 2
  1. 1.IBM Almaden Research CenterSAN JOSEUSA
  2. 2.IBM Research – AlmadenSan JoseUSA