Encyclopedia of Big Data Technologies

2019 Edition
| Editors: Sherif Sakr, Albert Y. Zomaya

Impala

  • Marcel Kornacker
  • Alex Behm
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-77525-8_253

Overview

Apache Impala is a modern MPP SQL engine architected specifically for the Hadoop data processing environment. Impala provides low latency and high concurrency for business intelligence/analytic workloads, which are not served well by batch-oriented processing frameworks such as MapReduce or Spark. Unlike traditional commercial analytic DBMSs, Impala is not a monolithic system but instead relies on the functionality of common, shared components of the Hadoop ecosystem to provide RDBMS-like functionality. In particular, Impala utilizes Hive’s Metastore as a system catalog, which it shares with other processing frameworks such as Hive and Spark. In addition, Impala does not provide persistence functionality itself; instead, it relies on storage management components of the Hadoop ecosystem (e.g., HDFS, Kudu), which it drives via their public APIs. These storage management components in effect implement the record service layer of a traditional RDBMS and define the degree to which...

This is a preview of subscription content, log in to check access.

References

  1. Behm A (2015) New in Cloudera Enterprise 5.5: support for complex types in Impala. https://blog.cloudera.com/blog/2015/11/new-in-cloudera-enterprise-5-5-support-for-complex-types-in-impala/
  2. Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th symposium on operating systems design and implementationGoogle Scholar
  3. Graefe G (1990) Encapsulation of parallelism in the Volcano query processing system. In: Proceedings of the 1990 ACM SIGMOD international conference on management of dataGoogle Scholar
  4. Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis & transformation. In: Proceedings of the international symposium on code generation and optimization, 2004Google Scholar
  5. Melnik S, Gubarev A, Long JJ, Romer G, Shivakumar S, Tolton M, Vassilakis T (2010) Dremel: interactive analysis of web-scale datasets. Proc VLDB Endow 3(1):330–339CrossRefGoogle Scholar
  6. Zahari M et al (2010) Spark: cluster computing with working sets. In: 2nd USENIX workshop on hot topics in cloud computingGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Marcel Kornacker
    • 1
  • Alex Behm
    • 2
  1. 1.Blink ComputingSan FranciscoUSA
  2. 2.DatabricksSan FranciscoUSA

Section editors and affiliations

  • Yuanyuan Tian
    • 1
  • Fatma Özcan
    • 2
  1. 1.IBM Almaden Research CenterSAN JOSEUSA
  2. 2.IBM Research – AlmadenSan JoseUSA