Abstract
In this chapter we explore machine learning. This topic is closely related to statistical modeling, which we considered in Chapter 14, in the sense that both deal with using data to describe and predict outcomes of uncertain or unknown processes. However, while statistical modeling emphasizes the model used in the analysis, machine learning sidesteps the model part and focuses on algorithms that can be trained to predict the outcome of new observations. In other words, the approach taken in statistical modeling emphasizes understanding how the data is generated, by devising models and tuning their parameters by fitting to the data. If the model is found to fit the data well and if it satisfies the relevant model assumptions, then the model gives an overall description of the process, and it can be used to compute statistics with known distributions and for evaluating statistical tests. However, if the actual data is too complex to be explained using available statistical models, this approach has reached its limits. In machine learning, on the other hand, the actual process that generates the data, and potential models thereof, is not central. Instead, the observed data and the explanatory variables are the fundamental starting point of a machine-learning application. Given data, machine-learning methods can be used to find patterns and structure in the data, which can be used to predict the outcome of new observations. Machine learning therefore does not provide an understanding of how data was generated, and because fewer assumptions are made regarding the distribution and statistical properties of the data, we typically cannot compute statistics and perform statistical tests regarding the significance of certain observations. Instead, machine learning puts a strong emphasis on the accuracy with which new observations are predicted.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In practice it is common to work with both statsmodels and scikit-learn, as they in many respects complement each other. However, in this chapter we focus solely on scikit-learn.
- 2.
However, note that we can never be sure that a machine-learning application does not suffer from overfitting before we see how the application performs on new observations, and a repeated reevaluation of the application on a regular basis is a good practice.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 Robert Johansson
About this chapter
Cite this chapter
Johansson, R. (2019). Machine Learning. In: Numerical Python . Apress, Berkeley, CA. https://doi.org/10.1007/978-1-4842-4246-9_15
Download citation
DOI: https://doi.org/10.1007/978-1-4842-4246-9_15
Published:
Publisher Name: Apress, Berkeley, CA
Print ISBN: 978-1-4842-4245-2
Online ISBN: 978-1-4842-4246-9
eBook Packages: Professional and Applied ComputingApress Access BooksProfessional and Applied Computing (R0)