Data Processing

Varga, Ervin

doi:10.1007/978-1-4842-4859-1_5

Ervin Varga²

8686 Accesses

Abstract

Data analysis is the central phase of a data science process. It is similar to the construction phase in software development, where actual code is produced. The focus is on being able to handle large volumes of data to synthesize an actionable insight and knowledge. Data processing is the major phase where math and software engineering skills interplay to cope with all sorts of scalability issues (size, velocity, complexity, etc.). It isn’t enough to simply pile up various technologies in the hope that all will auto-magically align and deliver the intended outcome. Knowing the basic paradigms and mechanisms is indispensable. This is the main topic of this chapter: to introduce and exemplify pertinent concepts related to scalable data processing. Once you properly understand these concepts, then you will be in a much better position to comprehend why a particular choice of technologies would be the best way to go.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Of course, for this we would first need to normalize ratings by some bias and convert missing values to zero. Later, we would need to denormalize the prediction using the same bias. Going into these details would create an unnecessary detour from our main topic of demonstrating latent features.
2.
You may want to try out LensKit (see https://lenskit.org ) to research and/or build your own recommender system. It provides you all the necessary infrastructure and algorithms to focus only on specific aspects of your engine.

Author information

Authors and Affiliations

Kikinda, Serbia
Ervin Varga

Authors

Ervin Varga
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Varga, E. (2019). Data Processing. In: Practical Data Science with Python 3. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4859-1_5

Download citation

DOI: https://doi.org/10.1007/978-1-4842-4859-1_5
Published: 08 September 2019
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-4858-4
Online ISBN: 978-1-4842-4859-1
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)

Publish with us

Policies and ethics