Abstract
In Chapter 5 we consider ways to compare the means of two populations. Now we extend these procedures to comparisons of means from several populations. For example, we may wish to compare the average hourly production of a company’s six factories. We say that the investigation has a factor factory that has six levels , namely the six identifiers distinguishing the factories from one another. Or we may wish to compare the yields per acre of five different varieties of wheat. Here, the factor is wheat, and the levels of wheat are variety1 through variety5. This chapter discusses investigations having a single factor. Experiments having two factors are discussed in Chapter 12, while situations with two or more factors are discussed in Chapters 13 and 14
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
V.L. Anderson, R.A. McLean, Design of Experiments (Marcel Dekker, New York,1974)
E. Anionwu, D. Watford, M. Brozovic, B. Kirkwood, Sickle cell disease in a British urban community. Br. Med. J. 282, 283–286 (1981)
G.E.P. Box, W.G. Hunter, J.S. Hunter, Statistics for Experimenters (Wiley, New York, 1978)
M.B. Brown, A.B. Forsyth, Robust tests for equality of variances. J. Am. Stat. Assoc. 69, 364–367 (1974)
E. Cameron, L. Pauling, Supplemental ascorbate in the supportive treatment of cancer: re-evaluation of prolongation of survival times in terminal human cancer. Proc. Natl. Acad. Sci. USA 75, 4538–4542 (1978)
W.G. Cochran, G.M. Cox, Experimental Designs, 2nd edn. (Wiley, New York, 1957)
W.J. Conover, M.E. Johnson, M.M. Johnson, A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics 23, 351–361 (1981)
Data Archive, J. Stat. Educ. (1997). URL: http://www.amstat.org/publications/jse/jse_data_archive.html
D.J. Hand, F. Daly, A.D. Lunn, K.J. McConway, E. Ostrowski, A Handbook of Small Data Sets (Chapman and Hall, London, 1994)
N.L. Johnson, F.C. Leone, Statistics and Experimental Design in Engineering and the Physical Sciences, vol. 2 (Wiley, New York, 1967)
G.A. Milliken, D.E. Johnson, Analysis of Messy Data, vol. I (Wadsworth, Belmont, 1984)
D.C. Montgomery, Design and Analysis of Experiments, 4th edn. (Wiley, New York, 1997)
NIST, National Institute of Standards and Technology, Statistical Engineering Division (2002). URL: http://www.itl.nist.gov/div898/software/dataplot.html/datasets.htm
R.L. Ott, An Introduction to Statistical Methods and Data Analysis, 4th edn. (Duxbury, Belmont, 1993)
R.G. Peterson, Design and Analysis of Experiments (Marcel Dekker, New York/Basel, 1985)
R. Till, Statistical Methods for the Earth Scientist (Macmillan, London, 1974)
P.H. Westfall, D. Rom, Bootstrap step-down testing with multivariate location shift data. Unpublished (1990)
Author information
Authors and Affiliations
Appendices
6.A Appendix: Computation for the Analysis of Variance
Model formulas are expressed in R with a symbolic notation which is a simplification of the more extended traditional notation
The intercept term μ and the error term ε ij are usually assumed. The existence of the subscripts is implied and the actual values are specified by the data values.
With R we will be using aov for the calculations and anova and related commands for the display of the results. aov can be used with equal or unequal cell sizes n i . Model (6.1) is denoted in R by the formula
Y ~ A
The operator ~ is read as “is modeled by”.
Two different algorithms are used to calculate the analysis of variance for data with one factor: sums of squared differences of cell means and regression on dummy variables. Both give identical results.
The intuition of the analysis is most easily developed with the sums of squared differences algorithm. We began there in Equation 6.6 and the definitions in the notes to Table 6.2. We show in Table 6.10 the partitioning of the observed values for the response variable concent in catalystm example into columns associated with the terms in the model. The sum of each row reproduces the response variable. This is called the linear identity. The sum of the squares in each column is the ANOVA table. This is called the quadratic identity. In the notation of Table 6.2 the numbers in the (Intercept) column are \(\bar{\bar{y}}\), the numbers in the catalyst column are the treatment effects \(\bar{y}_{i} -\bar{\bar{ y}}\), and the numbers in the Residuals column are \(y_{ij} -\bar{ y}_{i}\). The numbers in the result of the apply statement are the sums of squares: \(\sum _{ij}\bar{\bar{y}}^{2}\), \(\mathsf{SS}_{\mathrm{Tr}} =\sum _{ i=1}^{a}n_{i}(\bar{y}_{i} -\bar{\bar{ y}})^{2}\), \(\mathsf{SS}_{\mathrm{Res}} =\sum _{ i=1}^{a}\sum _{j=1}^{n_{i}}(y_{ij} -\bar{y}_{i})^{2}\), and \(\sum _{ij}y_{ij}^{2}\). We come back to the linear and quadratic identities in Table 8.6
The regression formulation is easier to work with and generalizes better. Once we have developed our intuition we will usually work with the regression formulation. The discussion of contrasts in Section 6.9 leads in to the regression formulation in Chapter 10. For the moment, In Table 6.11 we step forward into the notation of Chapter 10 and express the catalystm example in regression notation.
6.B Object Oriented Programming
Many of R’s functions are designed to be sensitive to the class of object to which they are applied. Figure 6.7 shows that the same syntax plot(x) produces a different form of plot depending on the class of the argument x.
The result of a function call (aov for example) is an object with a class ("aov"). Accessor functions such as summary or plot are sensitive to the class of their argument and produce an appropriate form of output as shown in Figures 6.7 and 6.8.
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this chapter
Cite this chapter
Heiberger, R.M., Holland, B. (2015). One-Way Analysis of Variance. In: Statistical Analysis and Data Display. Springer Texts in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2122-5_6
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2122-5_6
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2121-8
Online ISBN: 978-1-4939-2122-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)