Introduction to Spark
The first version of Spark was open sourced in 2010, and it went into Apache incubation in 2013. By early 2014, it was promoted to a top-level Apache project. It has already replaced Hadoop as the Big Data processing engine of choice in most organizations. This is a testament to its maturity and the richness of its design. Batch processing, iterative and interactive computation, stream processing, graph analytics, ETL, machine learning, and data warehousing; you name it and Spark can already handle it. This chapter is a hands-on primer to Spark to set the stage for the rest of the book.