Skip to main content

Spark Architecture and the Resilient Distributed Dataset

  • Chapter
  • First Online:
Book cover PySpark Recipes

Abstract

You learned Python in the preceding chapter. Now it is time to learn PySpark and utilize the power of a distributed system to solve problems related to big data. We generally distribute large amounts of data on a cluster and perform processing on that distributed data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Raju Kumar Mishra

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mishra, R.K. (2018). Spark Architecture and the Resilient Distributed Dataset. In: PySpark Recipes. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-3141-8_4

Download citation

Publish with us

Policies and ethics