Advertisement

Real-Time ETL and Analytics Magic

  • Zubair Nabi
Chapter

Abstract

Data (big or otherwise) has been woven into the fabric of most businesses. The world is at a stage where Big Data directly drives corporate strategy. To maintain a competitive edge, most businesses try to run their analytics pipeline in near real-time. Although this captures the behavior of a large class of applications that rely on unstructured data, it is not exhaustive: a significant chunk of data sources are structured, and their analysis applications require data-warehousing capabilities. One way to handle these requirements is to blend the existing Spark API with an external warehousing solution such as Hive, but this is a marriage of convenience rather than a natural fit: data must be copied back and forth, not to mention the burden of maintaining two different APIs. A better solution is Spark SQL.

Keywords

Data Frame Physical Plan Streaming Data Streaming Application External Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Copyright information

© Zubair Nabi 2016

Authors and Affiliations

  • Zubair Nabi
    • 1
  1. 1.LahorePakistan

Personalised recommendations