Spark Architecture and the Resilient Distributed Dataset

Mishra, Raju Kumar

doi:10.1007/978-1-4842-3141-8_4

Raju Kumar Mishra²

1960 Accesses
1 Citations
1 Altmetric

Abstract

You learned Python in the preceding chapter. Now it is time to learn PySpark and utilize the power of a distributed system to solve problems related to big data. We generally distribute large amounts of data on a cluster and perform processing on that distributed data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Author information

Authors and Affiliations

Bangalore, Karnataka, India
Raju Kumar Mishra

Authors

Raju Kumar Mishra
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mishra, R.K. (2018). Spark Architecture and the Resilient Distributed Dataset. In: PySpark Recipes. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-3141-8_4

Download citation

DOI: https://doi.org/10.1007/978-1-4842-3141-8_4
Published: 10 December 2017
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-3140-1
Online ISBN: 978-1-4842-3141-8
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics