Skip to main content

Data Processing

  • Chapter
  • First Online:
Book cover Practical Data Science with Python 3
  • 8686 Accesses

Abstract

Data analysis is the central phase of a data science process. It is similar to the construction phase in software development, where actual code is produced. The focus is on being able to handle large volumes of data to synthesize an actionable insight and knowledge. Data processing is the major phase where math and software engineering skills interplay to cope with all sorts of scalability issues (size, velocity, complexity, etc.). It isn’t enough to simply pile up various technologies in the hope that all will auto-magically align and deliver the intended outcome. Knowing the basic paradigms and mechanisms is indispensable. This is the main topic of this chapter: to introduce and exemplify pertinent concepts related to scalable data processing. Once you properly understand these concepts, then you will be in a much better position to comprehend why a particular choice of technologies would be the best way to go.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Of course, for this we would first need to normalize ratings by some bias and convert missing values to zero. Later, we would need to denormalize the prediction using the same bias. Going into these details would create an unnecessary detour from our main topic of demonstrating latent features.

  2. 2.

    You may want to try out LensKit (see https://lenskit.org ) to research and/or build your own recommender system. It provides you all the necessary infrastructure and algorithms to focus only on specific aspects of your engine.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Ervin Varga

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Varga, E. (2019). Data Processing. In: Practical Data Science with Python 3. Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4859-1_5

Download citation

Publish with us

Policies and ethics