In this chapter, we present statistical modelling approaches for predictive tasks in business and science. Most prominent is the ubiquitous multiple linear regression approach where coefficients are estimated using the ordinary least squares algorithm. There are many derivations and generalizations of that technique. In the form of logistic regression, it has been adapted to cope with binary classification problems. Various statistical survival models allow for modelling of time-to-event data. We will detail the many benefits and a few pitfalls of these techniques based on real-world examples. A primary focus will be on pointing out the added value that these statistical modelling tools yield over more black box-type machine-learning algorithms. In our opinion, the added value predominantly stems from the often much easier interpretation of the model, the availability of tools that pin down the influence of the predictor variables in concise form, and finally from the options they provide for variable selection and residual analysis, allowing for user-friendly model development, refinement, and improvement.
Unable to display preview. Download preview PDF.
The authors thank the editors for their constructive comments, which have led to significant improvements of this article.
- Allison, P. D. (2010). Survival analysis using SAS: A practical guide (2nd ed.). Cary, NC: SAS Institute.Google Scholar
- Leitgöb, H. (2013). The problem of modelling rare events in ML-based logistic regression – Assessing potential remedies via MC simulations. Conference Paper at European Survey Research Association, Ljubliana.Google Scholar