Skip to main content

Data Warehousing Using Hadoop

  • Chapter
  • First Online:
Pro Apache Hadoop
  • 3542 Accesses

Abstract

The Hadoop platform supports several data warehousing solutions, including Apache Hive, Impala, and Shark. These solutions are conceptually similar to relational databases at much larger scale but differ in their implementation and usage model. Relational databases are often used in transactional systems in which single row inserts, updates, and deletes must be executed atomically. Efficient indexing and referential integrity with primary/foreign keys allow modern relational databases to find records quickly and guarantee that all data satisfies a strict schema. Relational databases try to avoid full table scans whenever possible because I/O bandwidth is limited and tends to be the bottleneck in these systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Sameer Wadkar and Madhu Siddalingaiah

About this chapter

Cite this chapter

Wadkar, S., Siddalingaiah, M. (2014). Data Warehousing Using Hadoop. In: Pro Apache Hadoop. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4302-4864-4_10

Download citation

Publish with us

Policies and ethics