One-Way Analysis of Variance

Heiberger, Richard M.; Holland, Burt

doi:10.1007/978-1-4939-2122-5_6

Richard M. Heiberger⁶ &
Burt Holland⁶

Part of the book series: Springer Texts in Statistics ((STS))

268k Accesses

Abstract

In Chapter 5 we consider ways to compare the means of two populations. Now we extend these procedures to comparisons of means from several populations. For example, we may wish to compare the average hourly production of a company’s six factories. We say that the investigation has a factor factory that has six levels , namely the six identifiers distinguishing the factories from one another. Or we may wish to compare the yields per acre of five different varieties of wheat. Here, the factor is wheat, and the levels of wheat are variety1 through variety5. This chapter discusses investigations having a single factor. Experiments having two factors are discussed in Chapter 12, while situations with two or more factors are discussed in Chapters 13 and 14

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

V.L. Anderson, R.A. McLean, Design of Experiments (Marcel Dekker, New York,1974)
Google Scholar
E. Anionwu, D. Watford, M. Brozovic, B. Kirkwood, Sickle cell disease in a British urban community. Br. Med. J. 282, 283–286 (1981)
Google Scholar
G.E.P. Box, W.G. Hunter, J.S. Hunter, Statistics for Experimenters (Wiley, New York, 1978)
Google Scholar
M.B. Brown, A.B. Forsyth, Robust tests for equality of variances. J. Am. Stat. Assoc. 69, 364–367 (1974)
Google Scholar
E. Cameron, L. Pauling, Supplemental ascorbate in the supportive treatment of cancer: re-evaluation of prolongation of survival times in terminal human cancer. Proc. Natl. Acad. Sci. USA 75, 4538–4542 (1978)
Google Scholar
W.G. Cochran, G.M. Cox, Experimental Designs, 2nd edn. (Wiley, New York, 1957)
Google Scholar
W.J. Conover, M.E. Johnson, M.M. Johnson, A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics 23, 351–361 (1981)
Google Scholar
Data Archive, J. Stat. Educ. (1997). URL: http://www.amstat.org/publications/jse/jse_data_archive.html
D.J. Hand, F. Daly, A.D. Lunn, K.J. McConway, E. Ostrowski, A Handbook of Small Data Sets (Chapman and Hall, London, 1994)
Google Scholar
N.L. Johnson, F.C. Leone, Statistics and Experimental Design in Engineering and the Physical Sciences, vol. 2 (Wiley, New York, 1967)
Google Scholar
G.A. Milliken, D.E. Johnson, Analysis of Messy Data, vol. I (Wadsworth, Belmont, 1984)
Google Scholar
D.C. Montgomery, Design and Analysis of Experiments, 4th edn. (Wiley, New York, 1997)
Google Scholar
NIST, National Institute of Standards and Technology, Statistical Engineering Division (2002). URL: http://www.itl.nist.gov/div898/software/dataplot.html/datasets.htm
R.L. Ott, An Introduction to Statistical Methods and Data Analysis, 4th edn. (Duxbury, Belmont, 1993)
Google Scholar
R.G. Peterson, Design and Analysis of Experiments (Marcel Dekker, New York/Basel, 1985)
Google Scholar
R. Till, Statistical Methods for the Earth Scientist (Macmillan, London, 1974)
Google Scholar
P.H. Westfall, D. Rom, Bootstrap step-down testing with multivariate location shift data. Unpublished (1990)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Temple University, Philadelphia, PA, USA
Richard M. Heiberger & Burt Holland

Authors

Richard M. Heiberger
View author publications
You can also search for this author in PubMed Google Scholar
Burt Holland
View author publications
You can also search for this author in PubMed Google Scholar

Appendices

6.A Appendix: Computation for the Analysis of Variance

Model formulas are expressed in R with a symbolic notation which is a simplification of the more extended traditional notation

$$\displaystyle\begin{array}{rcl} y_{ij} =\mu +\alpha _{i} +\epsilon _{ij}\quad \quad \mbox{ for}\quad i = 1,\ldots,a\quad \mbox{ and}\quad j = 1,\ldots,n_{i}\qquad \qquad \qquad (6.1)& & {}\\ \end{array}$$

The intercept term μ and the error term ε _ij are usually assumed. The existence of the subscripts is implied and the actual values are specified by the data values.

With R we will be using aov for the calculations and anova and related commands for the display of the results. aov can be used with equal or unequal cell sizes n _i. Model (6.1) is denoted in R by the formula

Y ~ A

The operator ~ is read as “is modeled by”.

Two different algorithms are used to calculate the analysis of variance for data with one factor: sums of squared differences of cell means and regression on dummy variables. Both give identical results.

The intuition of the analysis is most easily developed with the sums of squared differences algorithm. We began there in Equation 6.6 and the definitions in the notes to Table 6.2. We show in Table 6.10 the partitioning of the observed values for the response variable concent in catalystm example into columns associated with the terms in the model. The sum of each row reproduces the response variable. This is called the linear identity. The sum of the squares in each column is the ANOVA table. This is called the quadratic identity. In the notation of Table 6.2 the numbers in the (Intercept) column are $\bar{\bar{y}}$, the numbers in the catalyst column are the treatment effects $\bar{y}_{i} -\bar{\bar{ y}}$, and the numbers in the Residuals column are $y_{ij} -\bar{ y}_{i}$. The numbers in the result of the apply statement are the sums of squares: $\sum _{ij}\bar{\bar{y}}^{2}$, $\mathsf{SS}_{\mathrm{Tr}} =\sum _{ i=1}^{a}n_{i}(\bar{y}_{i} -\bar{\bar{ y}})^{2}$, $\mathsf{SS}_{\mathrm{Res}} =\sum _{ i=1}^{a}\sum _{j=1}^{n_{i}}(y_{ij} -\bar{y}_{i})^{2}$, and $\sum _{ij}y_{ij}^{2}$. We come back to the linear and quadratic identities in Table 8.6

Table 6.10 Linear and quadratic identities for the one way Analysis of Variance. The column labeled Sum is the sum of the three columns of the projection matrix onto the space of the Grand Mean (labeled (Intercept)), the effects due to the factor catalyst, and the Residuals. The Sum column is identical to the observed response variable concent. The sums of squares of each column of the projection matrix are the numbers in the similarly labeled row in the “Sum of Squares” column of the ANOVA table.

Full size table

The regression formulation is easier to work with and generalizes better. Once we have developed our intuition we will usually work with the regression formulation. The discussion of contrasts in Section 6.9 leads in to the regression formulation in Chapter 10. For the moment, In Table 6.11 we step forward into the notation of Chapter 10 and express the catalystm example in regression notation.

Table 6.11 The aov by the factor catalyst in Table 6.10 is identical to the lm shown here by the three dummy variables generated from the catalyst factor. The degrees of freedom (1+1+1=3) and the Sums of Squares (8.8+2.7+741.1=85.7) are both the same.

Full size table

6.B Object Oriented Programming

Many of R’s functions are designed to be sensitive to the class of object to which they are applied. Figure 6.7 shows that the same syntax plot(x) produces a different form of plot depending on the class of the argument x.

The result of a function call (aov for example) is an object with a class ("aov"). Accessor functions such as summary or plot are sensitive to the class of their argument and produce an appropriate form of output as shown in Figures 6.7 and 6.8.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Heiberger, R.M., Holland, B. (2015). One-Way Analysis of Variance. In: Statistical Analysis and Data Display. Springer Texts in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2122-5_6

Download citation

DOI: https://doi.org/10.1007/978-1-4939-2122-5_6
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2121-8
Online ISBN: 978-1-4939-2122-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics