Skip to main content

Part of the book series: Springer Texts in Statistics ((STS))

  • 268k Accesses

Abstract

In Chapter 5 we consider ways to compare the means of two populations. Now we extend these procedures to comparisons of means from several populations. For example, we may wish to compare the average hourly production of a company’s six factories. We say that the investigation has a factor factory that has six levels , namely the six identifiers distinguishing the factories from one another. Or we may wish to compare the yields per acre of five different varieties of wheat. Here, the factor is wheat, and the levels of wheat are variety1 through variety5. This chapter discusses investigations having a single factor. Experiments having two factors are discussed in Chapter 12, while situations with two or more factors are discussed in Chapters 13 and 14

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • V.L. Anderson, R.A. McLean, Design of Experiments (Marcel Dekker, New York,1974)

    Google Scholar 

  • E. Anionwu, D. Watford, M. Brozovic, B. Kirkwood, Sickle cell disease in a British urban community. Br. Med. J. 282, 283–286 (1981)

    Google Scholar 

  • G.E.P. Box, W.G. Hunter, J.S. Hunter, Statistics for Experimenters (Wiley, New York, 1978)

    Google Scholar 

  • M.B. Brown, A.B. Forsyth, Robust tests for equality of variances. J. Am. Stat. Assoc. 69, 364–367 (1974)

    Google Scholar 

  • E. Cameron, L. Pauling, Supplemental ascorbate in the supportive treatment of cancer: re-evaluation of prolongation of survival times in terminal human cancer. Proc. Natl. Acad. Sci. USA 75, 4538–4542 (1978)

    Google Scholar 

  • W.G. Cochran, G.M. Cox, Experimental Designs, 2nd edn. (Wiley, New York, 1957)

    Google Scholar 

  • W.J. Conover, M.E. Johnson, M.M. Johnson, A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics 23, 351–361 (1981)

    Google Scholar 

  • Data Archive, J. Stat. Educ. (1997). URL: http://www.amstat.org/publications/jse/jse_data_archive.html

  • D.J. Hand, F. Daly, A.D. Lunn, K.J. McConway, E. Ostrowski, A Handbook of Small Data Sets (Chapman and Hall, London, 1994)

    Google Scholar 

  • N.L. Johnson, F.C. Leone, Statistics and Experimental Design in Engineering and the Physical Sciences, vol. 2 (Wiley, New York, 1967)

    Google Scholar 

  • G.A. Milliken, D.E. Johnson, Analysis of Messy Data, vol. I (Wadsworth, Belmont, 1984)

    Google Scholar 

  • D.C. Montgomery, Design and Analysis of Experiments, 4th edn. (Wiley, New York, 1997)

    Google Scholar 

  • NIST, National Institute of Standards and Technology, Statistical Engineering Division (2002). URL: http://www.itl.nist.gov/div898/software/dataplot.html/datasets.htm

  • R.L. Ott, An Introduction to Statistical Methods and Data Analysis, 4th edn. (Duxbury, Belmont, 1993)

    Google Scholar 

  • R.G. Peterson, Design and Analysis of Experiments (Marcel Dekker, New York/Basel, 1985)

    Google Scholar 

  • R. Till, Statistical Methods for the Earth Scientist (Macmillan, London, 1974)

    Google Scholar 

  • P.H. Westfall, D. Rom, Bootstrap step-down testing with multivariate location shift data. Unpublished (1990)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Appendices

6.A Appendix: Computation for the Analysis of Variance

Model formulas are expressed in R with a symbolic notation which is a simplification of the more extended traditional notation

$$\displaystyle\begin{array}{rcl} y_{ij} =\mu +\alpha _{i} +\epsilon _{ij}\quad \quad \mbox{ for}\quad i = 1,\ldots,a\quad \mbox{ and}\quad j = 1,\ldots,n_{i}\qquad \qquad \qquad (6.1)& & {}\\ \end{array}$$

The intercept term μ and the error term ε ij are usually assumed. The existence of the subscripts is implied and the actual values are specified by the data values.

With R we will be using aov for the calculations and anova and related commands for the display of the results. aov can be used with equal or unequal cell sizes n i . Model (6.1) is denoted in R by the formula

   Y ~ A 

The operator ~ is read as “is modeled by”.

Two different algorithms are used to calculate the analysis of variance for data with one factor: sums of squared differences of cell means and regression on dummy variables. Both give identical results.

The intuition of the analysis is most easily developed with the sums of squared differences algorithm. We began there in Equation 6.6 and the definitions in the notes to Table 6.2. We show in Table 6.10 the partitioning of the observed values for the response variable concent in catalystm example into columns associated with the terms in the model. The sum of each row reproduces the response variable. This is called the linear identity. The sum of the squares in each column is the ANOVA table. This is called the quadratic identity. In the notation of Table 6.2 the numbers in the (Intercept) column are \(\bar{\bar{y}}\), the numbers in the catalyst column are the treatment effects \(\bar{y}_{i} -\bar{\bar{ y}}\), and the numbers in the Residuals column are \(y_{ij} -\bar{ y}_{i}\). The numbers in the result of the apply statement are the sums of squares: \(\sum _{ij}\bar{\bar{y}}^{2}\), \(\mathsf{SS}_{\mathrm{Tr}} =\sum _{ i=1}^{a}n_{i}(\bar{y}_{i} -\bar{\bar{ y}})^{2}\), \(\mathsf{SS}_{\mathrm{Res}} =\sum _{ i=1}^{a}\sum _{j=1}^{n_{i}}(y_{ij} -\bar{y}_{i})^{2}\), and \(\sum _{ij}y_{ij}^{2}\). We come back to the linear and quadratic identities in Table 8.6

Table 6.10 Linear and quadratic identities for the one way Analysis of Variance. The column labeled Sum is the sum of the three columns of the projection matrix onto the space of the Grand Mean (labeled (Intercept)), the effects due to the factor catalyst, and the Residuals. The Sum column is identical to the observed response variable concent. The sums of squares of each column of the projection matrix are the numbers in the similarly labeled row in the “Sum of Squares” column of the ANOVA table.

The regression formulation is easier to work with and generalizes better. Once we have developed our intuition we will usually work with the regression formulation. The discussion of contrasts in Section 6.9 leads in to the regression formulation in Chapter 10. For the moment, In Table 6.11 we step forward into the notation of Chapter 10 and express the catalystm example in regression notation.

Table 6.11 The aov by the factor catalyst in Table 6.10 is identical to the lm shown here by the three dummy variables generated from the catalyst factor. The degrees of freedom (1+1+1=3) and the Sums of Squares (8.8+2.7+741.1=85.7) are both the same.

6.B Object Oriented Programming

Many of R’s functions are designed to be sensitive to the class of object to which they are applied. Figure 6.7 shows that the same syntax plot(x) produces a different form of plot depending on the class of the argument x.

Fig. 6.7
figure 7

The three columns of the data.frame tmp have three different classes. The plot function is sensitive to the class of its argument and draws a different style plot for each of these classes. The integer object (more generally numeric object) is plotted as a scatterplot with an index on the horizontal axis. The factor object is plotted as a barchart with the level names on the horizontal axis. The time series object is plotted as a line graph with the time value on the horizontal axis.

The result of a function call (aov for example) is an object with a class ("aov"). Accessor functions such as summary or plot are sensitive to the class of their argument and produce an appropriate form of output as shown in Figures 6.7 and 6.8.

Fig. 6.8
figure 8

The two accessor functions summary and plot are sensitive to the class of their argument and produce a form of output appropriate to the argument, in this case an "aov" object. Note that "aov" objects are a special case of "lm" objects. The summary function for an "aov" object produces an ANOVA table. The plot function for an "lm" object is a set of four diagnostic plots of the residuals from the fitted model. The contents of the panels of the plot are discussed in Sections 8.4 and 11.3.7

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this chapter

Cite this chapter

Heiberger, R.M., Holland, B. (2015). One-Way Analysis of Variance. In: Statistical Analysis and Data Display. Springer Texts in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2122-5_6

Download citation

Publish with us

Policies and ethics