Abstract
You learned Python in the preceding chapter. Now it is time to learn PySpark and utilize the power of a distributed system to solve problems related to big data. We generally distribute large amounts of data on a cluster and perform processing on that distributed data.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Raju Kumar Mishra
About this chapter
Cite this chapter
Mishra, R.K. (2018). Spark Architecture and the Resilient Distributed Dataset. In: PySpark Recipes. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-3141-8_4
Download citation
DOI: https://doi.org/10.1007/978-1-4842-3141-8_4
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-3140-1
Online ISBN: 978-1-4842-3141-8
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)