Abstract
Data analysis is the central phase of a data science process. It is similar to the construction phase in software development, where actual code is produced. The focus is on being able to handle large volumes of data to synthesize an actionable insight and knowledge. Data processing is the major phase where math and software engineering skills interplay to cope with all sorts of scalability issues (size, velocity, complexity, etc.). It isn’t enough to simply pile up various technologies in the hope that all will auto-magically align and deliver the intended outcome. Knowing the basic paradigms and mechanisms is indispensable. This is the main topic of this chapter: to introduce and exemplify pertinent concepts related to scalable data processing. Once you properly understand these concepts, then you will be in a much better position to comprehend why a particular choice of technologies would be the best way to go.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Of course, for this we would first need to normalize ratings by some bias and convert missing values to zero. Later, we would need to denormalize the prediction using the same bias. Going into these details would create an unnecessary detour from our main topic of demonstrating latent features.
- 2.
You may want to try out LensKit (see https://lenskit.org ) to research and/or build your own recommender system. It provides you all the necessary infrastructure and algorithms to focus only on specific aspects of your engine.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 Ervin Varga
About this chapter
Cite this chapter
Varga, E. (2019). Data Processing. In: Practical Data Science with Python 3. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4859-1_5
Download citation
DOI: https://doi.org/10.1007/978-1-4842-4859-1_5
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-4858-4
Online ISBN: 978-1-4842-4859-1
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)