Advertisement

Optimizing PySpark and PySpark Streaming

  • Raju Kumar Mishra
Chapter

Abstract

Spark is a distributed framework for facilitating parallel processing. The parallel algorithms require computation and communication between machines. While communicating, machines send or exchange data. This is also known as shuffling.

Copyright information

©  Raju Kumar Mishra 2018

Authors and Affiliations

  • Raju Kumar Mishra
    • 1
  1. 1.BangaloreIndia

Personalised recommendations