Chapter 1: Statistical Models

  • Peter K. Dunn
  • Gordon K. Smyth
Part of the Springer Texts in Statistics book series (STS)


This chapter introduces the concept of a statistical model. One particular type of statistical model—the generalized linear model—is the focus of this book, and so we begin with an introduction to statistical models in general. This allows us to introduce the necessary language, notation, and other important issues. We first discuss conventions for describing data mathematically (Sect. 1.2). We then highlight the importance of plotting data (Sect. 1.3), and explain how to numerically code non-numerical variables (Sect. 1.4) so that they can be used in mathematical models. We then introduce the two components of a statistical model used for understanding data (Sect. 1.5): the systematic and random components. The class of regression models is then introduced (Sect. 1.6), which includes all models in this book. Model interpretation is then considered (Sect. 1.7), followed by comparing physical models and statistical models (Sect. 1.8) to highlight the similarities and differences. The purpose of a statistical model is then given (Sect. 1.9), followed by a description of the two criteria for evaluating statistical models: accuracy and parsimony (Sect. 1.10). The importance of understanding the limitations of statistical models is then addressed (Sect. 1.11), including the differences between observational and experimental data. The generalizability of models is then discussed (Sect. 1.12). Finally, we make some introductory comments about using r for statistical modelling (Sect. 1.13).


  1. [1]
    Agresti, A.: An Introduction to Categorical Data Analysis, second edn. Wiley-Interscience (2007)Google Scholar
  2. [2]
    Box, G.E.P., Draper, N.R.: Empirical Model-Building and Response Surfaces. Wiley, New York (1987)zbMATHGoogle Scholar
  3. [3]
    Brockmann, H.J.: Satellite male groups in horseshoe crabs, limulus polyphemus. Ethology 102, 1–21 (1996)CrossRefGoogle Scholar
  4. [4]
    Dunn, P.K., Smyth, G.K.: GLMsData: Generalized linear model data sets (2017). URL R package version 1.0.0
  5. [5]
    Efron, B.: Double exponential families and their use in generalized linear regression. Journal of the American Statistical Association 81(395), 709–721 (1986)MathSciNetCrossRefGoogle Scholar
  6. [6]
    Giauque, W.F., Wiebe, R.: The heat capacity of hydrogen bromide from \(15^{\circ }\) K. to its boiling point and its heat of vaporization. The entropy from spectroscopic data. Journal of the American Chemical Society 51(5), 1441–1449 (1929)CrossRefGoogle Scholar
  7. [7]
    Hand, D.J., Daly, F., Lunn, A.D., McConway, K.Y., Ostrowski, E.: A Handbook of Small Data Sets. Chapman and Hall, London (1996)zbMATHGoogle Scholar
  8. [8]
    Joglekar, G., Scheunemyer, J.H., LaRiccia, V.: Lack-of-fit testing when replicates are not available. The American Statistician 43, 135–143 (1989)Google Scholar
  9. [9]
    Johnson, B., Courtney, D.M.: Tower building. Child Development 2(2), 161–162 (1931)CrossRefGoogle Scholar
  10. [10]
    Kahn, M.: An exhalent problem for teaching statistics. Journal of Statistical Education 13(2) (2005)Google Scholar
  11. [11]
    Maron, M.: Threshold effect of eucalypt density on an aggressive avian competitor. Biological Conservation 136, 100–107 (2007)CrossRefGoogle Scholar
  12. [12]
    Mazess, R.B., Peppler, W.W., Gibbons, M.: Total body composition by dualphoton (153Gd) absorptiometry. American Journal of Clinical Nutrition 40, 834–839 (1984)CrossRefGoogle Scholar
  13. [13]
    Myers, R.H., Montgomery, D.C., Vining, G.G.: Generalized Linear Models with Applications in Engineering and the Sciences. Wiley, Chichester (2002)zbMATHGoogle Scholar
  14. [14]
    Nelson, W.: Applied Life Data Analysis. Wiley Series in Probability and Statistics. John Wiley Sons, New York (1982)CrossRefGoogle Scholar
  15. [15]
    Royston, P., Altman, D.G.: Regression using fractional polynomials of continuous covariates: Parsimonious parametric modelling. Journal of the Royal Statistical Society, Series C 43(3), 429–467 (1994)Google Scholar
  16. [16]
    Shacham, M., Brauner, N.: Minimizing the effects of collinearity in polynomial regression. Industrial and Engineering Chemical Research 36, 4405–4412 (1997)CrossRefGoogle Scholar
  17. [17]
    Singer, J.D., Willett, J.B.: Improving the teaching of applied statistics: Putting the data back into data analysis. The American Statistician 44(3), 223–230 (1990)Google Scholar
  18. [18]
    Smyth, G.K.: Australasian data and story library (Ozdasl) (2011). URL
  19. [19]
    Tager, I.B., Weiss, S.T., Muñoz, A., Rosner, B., Speizer, F.E.: Longitudinal study of the effects of maternal smoking on pulmonary function in children. New England Journal of Medicine 309(12), 699–703 (1983)CrossRefGoogle Scholar
  20. [20]
    Tager, I.B., Weiss, S.T., Rosner, B., Speizer, F.E.: Effect of parental cigarette smoking on the pulmonary function of children. American Journal of Epidemiology 110(1), 15–26 (1979)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Peter K. Dunn
    • 1
  • Gordon K. Smyth
    • 2
  1. 1.Faculty of Science, Health, Education and EngineeringSchool of Health of Sport Science, University of the Sunshine CoastQueenslandAustralia
  2. 2.Bioinformatics DivisionWalter and Eliza Hall Institute of Medical ResearchParkvilleAustralia

Personalised recommendations