Overview
- Presents advanced features of PySpark and code optimization techniques
- Covers SparkSQL, Spark Streaming, Spark MLlib, and GraphFrames
- Discusses and demonstrates Data Science and Big Data processing with PySpark MLlib
Access this book
Tax calculation will be finalised at checkout
Other ways to access
Table of contents (9 chapters)
Keywords
About this book
PySpark Recipes covers Hadoop and its shortcomings. The architecture of Spark, PySpark, and RDD are presented. You will learn to apply RDD to solve day-to-day big data problems. Python and NumPy are included and make it easy for new learners of PySpark to understand and adopt the model.
What You Will Learn
- Understand the advanced features of PySpark2 and SparkSQL
- Optimize your code
- Program SparkSQL with Python
- Use Spark Streaming and Spark MLlib with Python
- Perform graph analysis with GraphFrames
Who This Book Is For
Data analysts, Python programmers, big data enthusiasts
Authors and Affiliations
About the author
Bibliographic Information
Book Title: PySpark Recipes
Book Subtitle: A Problem-Solution Approach with PySpark2
Authors: Raju Kumar Mishra
DOI: https://doi.org/10.1007/978-1-4842-3141-8
Publisher: Apress Berkeley, CA
eBook Packages: Professional and Applied Computing, Apress Access Books, Professional and Applied Computing (R0)
Copyright Information: Raju Kumar Mishra 2018
Softcover ISBN: 978-1-4842-3140-1Published: 10 December 2017
eBook ISBN: 978-1-4842-3141-8Published: 09 December 2017
Edition Number: 1
Number of Pages: XXIII, 265
Number of Illustrations: 35 b/w illustrations, 12 illustrations in colour
Topics: Big Data, Programming Techniques, Programming Languages, Compilers, Interpreters, Data Mining and Knowledge Discovery
Industry Sectors: IT & Software