Big Data Processing Using Spark in Cloud

  • Mamta Mittal
  • Valentina E. Balas
  • Lalit Mohan Goyal
  • Raghvendra Kumar

Part of the Studies in Big Data book series (SBD, volume 43)

Table of contents

  1. Front Matter
    Pages i-xiii
  2. Kamalinder Kaur, Vishal Bharti
    Pages 1-22
  3. Ankita Bansal, Roopal Jain, Kanika Modi
    Pages 23-50
  4. Neha Sharma, Madhavi Shamkuwar
    Pages 51-85
  5. Archana Singh, Mamta Mittal, Namita Kapoor
    Pages 107-122
  6. Le Hoang Son, Hrudaya Kumar Tripathy, Biswa Ranjan Acharya, Raghvendra Kumar, Jyotir Moy Chatterjee
    Pages 143-165
  7. M. Venkatesh Saravanakumar, Sabibullah Mohamed Hanifa
    Pages 195-215
  8. Archana A. Chaudhari, Preeti Mulay
    Pages 237-264
  9. Le Hoang Son, Hrudaya Kumar Tripathy, Biswa Ranjan Acharya, Raghvendra Kumar, Jyotir Moy Chatterjee
    Pages E1-E1

About this book


The book describes the emergence of big data technologies and the role of Spark in the entire big data stack. It compares Spark and Hadoop and identifies the shortcomings of Hadoop that have been overcome by Spark. The book mainly focuses on the in-depth architecture of Spark and our understanding of Spark RDDs and how RDD complements big data’s immutable nature, and solves it with lazy evaluation, cacheable and type inference. It also addresses advanced topics in Spark, starting with the basics of Scala and the core Spark framework, and exploring Spark data frames, machine learning using Mllib, graph analytics using Graph X and real-time processing with Apache Kafka, AWS Kenisis, and Azure Event Hub. It then goes on to investigate Spark using PySpark and R. Focusing on the current big data stack, the book examines the interaction with current big data tools, with Spark being the core processing layer for all types of data.

The book is intended for data engineers and scientists working on massive datasets and big data technologies in the cloud. In addition to industry professionals, it is helpful for aspiring data processing professionals and students working in big data processing and cloud computing environments.


Cloud Computing Big Data Analysis Data Processing Privacy Preservation Data Analysis Spark Cluster Casandra Spark SQL

Editors and affiliations

  • Mamta Mittal
    • 1
  • Valentina E. Balas
    • 2
  • Lalit Mohan Goyal
    • 3
  • Raghvendra Kumar
    • 4
  1. 1.Department of Computer Science and EngineeringGB Pant Government Engineering CollegeNew DelhiIndia
  2. 2.Department of Automation and Applied InformaticsAurel Vlaicu University of AradAradRomania
  3. 3.Department of Computer Science and EngineeringBharati Vidyapeeth’s College of EngineeringNew DelhiIndia
  4. 4.Department of Computer Science and EngineeringLaxmi Narayan College of TechnologyJabalpurIndia

Bibliographic information

  • DOI
  • Copyright Information Springer Nature Singapore Pte Ltd. 2019
  • Publisher Name Springer, Singapore
  • eBook Packages Engineering Engineering (R0)
  • Print ISBN 978-981-13-0549-8
  • Online ISBN 978-981-13-0550-4
  • Series Print ISSN 2197-6503
  • Series Online ISSN 2197-6511
  • Buy this book on publisher's site
Industry Sectors
Finance, Business & Banking
IT & Software
Consumer Packaged Goods