Statistical Modelling

  • Marcel Dettling
  • Andreas RuckstuhlEmail author


In this chapter, we present statistical modelling approaches for predictive tasks in business and science. Most prominent is the ubiquitous multiple linear regression approach where coefficients are estimated using the ordinary least squares algorithm. There are many derivations and generalizations of that technique. In the form of logistic regression, it has been adapted to cope with binary classification problems. Various statistical survival models allow for modelling of time-to-event data. We will detail the many benefits and a few pitfalls of these techniques based on real-world examples. A primary focus will be on pointing out the added value that these statistical modelling tools yield over more black box-type machine-learning algorithms. In our opinion, the added value predominantly stems from the often much easier interpretation of the model, the availability of tools that pin down the influence of the predictor variables in concise form, and finally from the options they provide for variable selection and residual analysis, allowing for user-friendly model development, refinement, and improvement.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



The authors thank the editors for their constructive comments, which have led to significant improvements of this article.


  1. Allison, P. D. (2010). Survival analysis using SAS: A practical guide (2nd ed.). Cary, NC: SAS Institute.Google Scholar
  2. Cox, D. R. (1958). The regression analysis of binary sequences (with discussion). Journal of the Royal Statistical Society B, 20, 215–242.MathSciNetzbMATHGoogle Scholar
  3. Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society, Series B., 34(2), 187–220.MathSciNetzbMATHGoogle Scholar
  4. Diggle, P. J., & Chetwynd, A. G. (2011). Statistics and scientific method: An introduction for students and researcher. New York: Oxford University Press.CrossRefGoogle Scholar
  5. Fahrmeir, L., & Tutz, G. (2001). Multivariate statistical modelling based on generalized linear models (Springer series in statistics). New York: Springer.CrossRefGoogle Scholar
  6. Harrell, F. (2015). Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis (Springer series in statistics). Heidelberg: Springer.CrossRefGoogle Scholar
  7. Hastie, T., & Tibshirani, R. (1990). Generalized additive models. London: Chapman and Hall.zbMATHGoogle Scholar
  8. Kalbfleisch, J. D., & Prentice, R. L. (2002). The statistical analysis of failure time data (Wiley series in probability and statistics) (2nd ed.). Hoboken, NJ: Wiley.CrossRefGoogle Scholar
  9. King, G., & Zeng, L. (2001). Logistic regression in rare events data. Political Analysis, 9, 137–163.CrossRefGoogle Scholar
  10. Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. New York: Springer.CrossRefGoogle Scholar
  11. Leitgöb, H. (2013). The problem of modelling rare events in ML-based logistic regression – Assessing potential remedies via MC simulations. Conference Paper at European Survey Research Association, Ljubliana.Google Scholar
  12. McCullagh, P., & Nelder, J. (1989). Generalized linear models (Monographs on statistics & applied probability) (2nd ed.). Boca Raton, FL: Chapman & Hall/CRC.CrossRefGoogle Scholar
  13. Montgomery, D., Peck, E., & Vining, G. (2006). Introduction to linear regression analysis. New York: Wiley Interscience.zbMATHGoogle Scholar
  14. Plackett, R. L. (1972). The discovery of the method of least squares. Biometrika, 59(2), 239–251.MathSciNetzbMATHGoogle Scholar
  15. Sen, A., & Srivastava, M. (1990). Regression analysis: Theory, methods, and applications. New York: Springer.CrossRefGoogle Scholar
  16. Stigler, S. M. (1981). Gauss and the invention of least squares. Annals of Statistics, 9(3), 465–474.MathSciNetCrossRefGoogle Scholar
  17. Tufféry, S. (2011). Data mining and statistics for decision making. Chichester: Wiley.CrossRefGoogle Scholar
  18. Wood, S. (2006). Generalized additive models: An introduction with R (Texts in statistical science). Boca Raton, FL: Chapman & Hall/CRC.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Institute for Data Analysis and Process DesignZHAW Zurich University of Applied SciencesWinterthurSwitzerland

Personalised recommendations