Effective Analysis of Genomic Data

  • Paul R. Nelson
  • Andrew B. Goulter
  • Richard J. Davis
Part of the Methods in Molecular Medicine book series (MIMM, volume 104)


High-throughput biotechnology has enabled genome-wide investigation of gene expression and has the potential to identify genes that have a role to play in focal cerebral ischemia, as well as many other interventions. The advent of this technology has also led to the generation of large amounts of expensive and complex expression data. One of the major problems with the generation of so much data is locating and extracting the relevant information to aid target identification and interpretation effectively and reliably. Statistical involvement is vital. Not only does it help to ensure effective extraction of information from the data, it also increases the likelihood that the data collected will embody the information about the differential expression of interest in the first place. The goal of this chapter is to recommend an effective process for investigating gene expression data. There are five stages in this process that we believe lead to reliable results when routinely applied to an expression dataset, once it has been appropriately generated and collected: (1) biological problem definition and design selection; (2) data examination, “preprocessing,” and reexamination; (3) data analysis step I: screening for differentially expressed genes; (4) data analysis step II: verifying differential expression; and (5) biological verification, interpretation, and communication.

Key Words

Differential expression experimental design data examination visualization preprocessing baseline features data analysis multivariate principal components analysis hierarchical cluster analysis partial least squares discriminant analysis regression coefficients variable influence on projection univariate analysis of variance covariate fold difference 


  1. 1.
    Fisher, R. A. (1925) Statistical Methods for Research Workers. Oliver & Boyd, Edinburgh.Google Scholar
  2. 2.
    Fisher, R. A. (1926) The arrangement of field experiments. J. Minis. Agric. 33, 503–513.Google Scholar
  3. 3.
    Yates, F. (1937) The Design and Analysis of Factorial Experiments. Technical Communication No. 35. Imperial Bureau of Soil Science, Harpenden, Hertfordshire, UK.Google Scholar
  4. 4.
    Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868.PubMedCrossRefGoogle Scholar
  5. 5.
    Jackson, J. E. (1980) Principal components and factor analysis: part I—principal components. J. Qual. Technol. 12, 201–213.Google Scholar
  6. 6.
    Wold, S., Albano, C., Dunn, W. J., et al. (1984) Multivariate data analysis in chemistry, in: Chemometrics: Mathematics and Statistics in Chemistry (Kowalski, B. R., ed.), D. Reidel, Dordrecht.Google Scholar
  7. 7.
    Smyth, G. K. and Speed, T. (2003) Normalization of cDNA microarray data. Methods 31, 265–273.PubMedCrossRefGoogle Scholar
  8. 8.
    Lin, Y., Nadler, S. T., Attie, A. D., and Yandell, B. S. (2001) Mining for low-abundance transcripts in microarray data. Department of Statistics Technical Report #1031, University of Wisconsin, Madison, WI.Google Scholar
  9. 9.
    Dudoit, S., Yang, Y. H., Callow, M. J., and Speed, T. P. (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat. Sin. 12, 111–140.Google Scholar
  10. 10.
    Draper, N. and Smith, H. (1981) Applied Regression Analysis, 2nd ed. Wiley, New York.Google Scholar
  11. 11.
    Albano, C., Dunn, W. J. III, Edlund, U., et al. (1978) Four levels of pattern recognition. Anal. Chim. Acta 103, 429–443.CrossRefGoogle Scholar
  12. 12.
    Beebe, K. R., Pell, R. J., and Seasholtz, M. B. (1998) Chemometrics: A Practical Guide. Wiley, New York.Google Scholar
  13. 13.
    Hsu, J. C. Multiple Comparisons. Chapman and Hall, London.Google Scholar
  14. 14.
    Wetherill, G. B. Intermediate Statistical Methods (1981) Chapman and Hall, London, UK.Google Scholar

Copyright information

© Humana Press Inc., Totowa, NJ 2005

Authors and Affiliations

  • Paul R. Nelson
    • 1
  • Andrew B. Goulter
    • 2
  • Richard J. Davis
    • 3
  1. 1.Prism Training and Consultancy Ltd.CambridgeUK
  2. 2.Exploratory Target ProfilingPharmagene plcRoystonUK
  3. 3.Pharmagene plcRoystonUK

Personalised recommendations