Skip to main content

Caching for SQL-on-Hadoop

  • Reference work entry
  • First Online:
Encyclopedia of Big Data Technologies
  • 86 Accesses

Definitions

Caching for SQL-on-Hadoop are techniques and systems which store data to provide faster access to that data, for Structured Query Language (SQL) engines running on the Apache Hadoop ecosystem.

Overview

The Apache Hadoop software project (Apache Hadoop 2018) has grown in popularity for distributed computing and big data. The Hadoop stack is widely used for storing large amounts of data, and for large-scale, distributed, and fault-tolerant data processing of that data. The Hadoop ecosystem has been important for organizations to extract actionable insight from the large volumes of collected data, which is difficult or infeasible for traditional data processing methods.

The main storage system for Hadoop is the Hadoop Distributed File System (HDFS). It is a distributed storage system which provides fault-tolerant and scalable storage. The main data processing framework for Hadoop is MapReduce, which is based on the Google MapReduce project (Dean and Ghemawat 2008). MapReduce...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 849.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 999.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gene Pang .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Pang, G., Li, H. (2019). Caching for SQL-on-Hadoop. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_249

Download citation

Publish with us

Policies and ethics