Introduction to the Statistical Analysis of Two-Color Microarray Data
Microarray experiments have become routine in the past few years in many fields of biology. Analysis of array hybridizations is often performed with the help of commercial software programs, which produce gene lists, graphs, and sometimes provide values for the statistical significance of the results. Exactly what is computed by many of the available programs is often not easy to reconstruct or may even be impossible to know for the end user. It is therefore not surprising that many biology students and some researchers using microarray data do not fully understand the nature of the underlying statistics used to arrive at the results.
We have developed a module that we have used successfully in undergraduate biology and statistics education that allows students to get a better understanding of both the basic biological and statistical theory needed to comprehend primary microarray data. The module is intended for the undergraduate level but may be useful to anyone who is new to the field of microarray biology. Additional course material that was developed for classroom use can be found at http://www.polyploidy.org/.
In our undergraduate classrooms we encourage students to manipulate microarray data using Microsoft Excel to reinforce some of the concepts they learn. We have included instructions for some of these manipulations throughout this chapter (see the “Do this…” boxes). However, it should be noted that while Excel can effectively analyze our small sample data set, more specialized software would typically be used to analyze full microarray data sets. Nevertheless, we believe that manipulating a small data set with Excel can provide insights into the workings of more advanced analysis software.
Key wordsMicroarray variation variance normalization dye-swap t distribution t-test ANOVA Bonferroni Method False Discovery Rate (FDR)
We acknowledge the advice of our colleagues of the “Polyploidy Research Group.” Special thanks go to RW Doerge for advice, support, and critical reading of the manuscript. This work was supported by NSF Plant Genome grant DBI-0501712.
- 2.Butte, A. (2002) The use and analysis of microarray data. Nat Rev Drug Discov 1, 951–60.Google Scholar
- 3.John, U.P., and Spangenberg, G.C. (2005) Xenogenomics: Genomic bioprospecting in indigenous and exotic plants through EST discovery, cDNA microarray-based expression profiling and functional genomics. Comp Funct Genomics 6(4), 230–5.Google Scholar
- 4.Heath, L.S., Ramakrishnan, N., Sederoff, R.R., Whetten, R.W., Chevone, B.I., Struble, C.A., Jouenne, V.Y., Chen, D., van Zyl, L., and Grene, R. (2002) Studying the functional genomics of stress responses in loblolly pine with the expresso microarray experiment management system. Comp Funct Genomics 3(3), 226–43.Google Scholar
- 5.Mohammadi, M., Kav, N.N., and Deyholos, M.K. (2008) Transcript expression profile of water-limited roots of hexaploid wheat (Triticum aestivum ‘Opata’). Genome 51(5), 357–67.Google Scholar
- 8.Sokal, R.R., and Rohlf, F.J. (1994) Biometry: The Principles and Practices of Statistics in Biological Research. W.H. Freeman.Google Scholar
- 9.Yang, Y.H., and Speed, T. (2002) Design issues for cDNA microarray experiments. Nat Rev Genet 3, 579–88.Google Scholar
- 10.Owzar, K., Barry, W.T., Jung, S.H., Sohn, I., and George, S.L. (2008) Statistical challenges in preprocessing in microarray experiments in cancer. Clin Cancer Res 14(19), 5959–66.Google Scholar
- 13.Benjamini, Y., and Hochberg, Y. (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Series B 57(1), 289–300.Google Scholar