Abstract
This chapter introduces the concept of a statistical model. One particular type of statistical model—the generalized linear model—is the focus of this book, and so we begin with an introduction to statistical models in general. This allows us to introduce the necessary language, notation, and other important issues. We first discuss conventions for describing data mathematically (Sect. 1.2). We then highlight the importance of plotting data (Sect. 1.3), and explain how to numerically code non-numerical variables (Sect. 1.4) so that they can be used in mathematical models. We then introduce the two components of a statistical model used for understanding data (Sect. 1.5): the systematic and random components. The class of regression models is then introduced (Sect. 1.6), which includes all models in this book. Model interpretation is then considered (Sect. 1.7), followed by comparing physical models and statistical models (Sect. 1.8) to highlight the similarities and differences. The purpose of a statistical model is then given (Sect. 1.9), followed by a description of the two criteria for evaluating statistical models: accuracy and parsimony (Sect. 1.10). The importance of understanding the limitations of statistical models is then addressed (Sect. 1.11), including the differences between observational and experimental data. The generalizability of models is then discussed (Sect. 1.12). Finally, we make some introductory comments about using r for statistical modelling (Sect. 1.13).
…all models are approximations. Essentially, all models are wrong, but some are useful. However, the approximate nature of the model must always be borne in mind.
Box and Draper [2, p. 424]
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agresti, A.: An Introduction to Categorical Data Analysis, second edn. Wiley-Interscience (2007)
Box, G.E.P., Draper, N.R.: Empirical Model-Building and Response Surfaces. Wiley, New York (1987)
Brockmann, H.J.: Satellite male groups in horseshoe crabs, limulus polyphemus. Ethology 102, 1–21 (1996)
Dunn, P.K., Smyth, G.K.: GLMsData: Generalized linear model data sets (2017). URL https://CRAN.R-project.org/package=GLMsData. R package version 1.0.0
Efron, B.: Double exponential families and their use in generalized linear regression. Journal of the American Statistical Association 81(395), 709–721 (1986)
Giauque, W.F., Wiebe, R.: The heat capacity of hydrogen bromide from \(15^{\circ }\) K. to its boiling point and its heat of vaporization. The entropy from spectroscopic data. Journal of the American Chemical Society 51(5), 1441–1449 (1929)
Hand, D.J., Daly, F., Lunn, A.D., McConway, K.Y., Ostrowski, E.: A Handbook of Small Data Sets. Chapman and Hall, London (1996)
Joglekar, G., Scheunemyer, J.H., LaRiccia, V.: Lack-of-fit testing when replicates are not available. The American Statistician 43, 135–143 (1989)
Johnson, B., Courtney, D.M.: Tower building. Child Development 2(2), 161–162 (1931)
Kahn, M.: An exhalent problem for teaching statistics. Journal of Statistical Education 13(2) (2005)
Maron, M.: Threshold effect of eucalypt density on an aggressive avian competitor. Biological Conservation 136, 100–107 (2007)
Mazess, R.B., Peppler, W.W., Gibbons, M.: Total body composition by dualphoton (153Gd) absorptiometry. American Journal of Clinical Nutrition 40, 834–839 (1984)
Myers, R.H., Montgomery, D.C., Vining, G.G.: Generalized Linear Models with Applications in Engineering and the Sciences. Wiley, Chichester (2002)
Nelson, W.: Applied Life Data Analysis. Wiley Series in Probability and Statistics. John Wiley Sons, New York (1982)
Royston, P., Altman, D.G.: Regression using fractional polynomials of continuous covariates: Parsimonious parametric modelling. Journal of the Royal Statistical Society, Series C 43(3), 429–467 (1994)
Shacham, M., Brauner, N.: Minimizing the effects of collinearity in polynomial regression. Industrial and Engineering Chemical Research 36, 4405–4412 (1997)
Singer, J.D., Willett, J.B.: Improving the teaching of applied statistics: Putting the data back into data analysis. The American Statistician 44(3), 223–230 (1990)
Smyth, G.K.: Australasian data and story library (Ozdasl) (2011). URL http://www.statsci.org/data
Tager, I.B., Weiss, S.T., Muñoz, A., Rosner, B., Speizer, F.E.: Longitudinal study of the effects of maternal smoking on pulmonary function in children. New England Journal of Medicine 309(12), 699–703 (1983)
Tager, I.B., Weiss, S.T., Rosner, B., Speizer, F.E.: Effect of parental cigarette smoking on the pulmonary function of children. American Journal of Epidemiology 110(1), 15–26 (1979)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this chapter
Cite this chapter
Dunn, P.K., Smyth, G.K. (2018). Chapter 1: Statistical Models. In: Generalized Linear Models With Examples in R. Springer Texts in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-0118-7_1
Download citation
DOI: https://doi.org/10.1007/978-1-4419-0118-7_1
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-0117-0
Online ISBN: 978-1-4419-0118-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)