Abstract
Data (big or otherwise) has been woven into the fabric of most businesses. The world is at a stage where Big Data directly drives corporate strategy. To maintain a competitive edge, most businesses try to run their analytics pipeline in near real-time. Although this captures the behavior of a large class of applications that rely on unstructured data, it is not exhaustive: a significant chunk of data sources are structured, and their analysis applications require data-warehousing capabilities. One way to handle these requirements is to blend the existing Spark API with an external warehousing solution such as Hive, but this is a marriage of convenience rather than a natural fit: data must be copied back and forth, not to mention the burden of maintaining two different APIs. A better solution is Spark SQL.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Charles Clover, “Urban Population to Exceed 50 Percent,” The Telegraph, June 27, 2007, " www.telegraph.co.uk/news/earth/earthnews/3298527/Urban-population-to-exceed-50-per-cent.html .
- 2.
Joshua Pramis, “Number of Mobile Phones to Exceed World Population by 2014,” Digital Trends, February 28, 2013, www.digitaltrends.com/mobile/mobile-phone-world-population-2014/ .
- 3.
Open Big Data, Dandelion, https://dandelion.eu/datamine/open-big-data/ .
- 4.
- 5.
A Parquet table is typically made up of more than one file, which may be located at multiple locations.
- 6.
- 7.
- 8.
- 9.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Zubair Nabi
About this chapter
Cite this chapter
Nabi, Z. (2016). Real-Time ETL and Analytics Magic. In: Pro Spark Streaming. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-1479-4_8
Download citation
DOI: https://doi.org/10.1007/978-1-4842-1479-4_8
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-1480-0
Online ISBN: 978-1-4842-1479-4
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)