Abstract
The Hadoop platform supports several data warehousing solutions, including Apache Hive, Impala, and Shark. These solutions are conceptually similar to relational databases at much larger scale but differ in their implementation and usage model. Relational databases are often used in transactional systems in which single row inserts, updates, and deletes must be executed atomically. Efficient indexing and referential integrity with primary/foreign keys allow modern relational databases to find records quickly and guarantee that all data satisfies a strict schema. Relational databases try to avoid full table scans whenever possible because I/O bandwidth is limited and tends to be the bottleneck in these systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Sameer Wadkar and Madhu Siddalingaiah
About this chapter
Cite this chapter
Wadkar, S., Siddalingaiah, M. (2014). Data Warehousing Using Hadoop. In: Pro Apache Hadoop. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4302-4864-4_10
Download citation
DOI: https://doi.org/10.1007/978-1-4302-4864-4_10
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4302-4863-7
Online ISBN: 978-1-4302-4864-4
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)