Abstract
High-throughput biotechnology has enabled genome-wide investigation of gene expression and has the potential to identify genes that have a role to play in focal cerebral ischemia, as well as many other interventions. The advent of this technology has also led to the generation of large amounts of expensive and complex expression data. One of the major problems with the generation of so much data is locating and extracting the relevant information to aid target identification and interpretation effectively and reliably. Statistical involvement is vital. Not only does it help to ensure effective extraction of information from the data, it also increases the likelihood that the data collected will embody the information about the differential expression of interest in the first place. The goal of this chapter is to recommend an effective process for investigating gene expression data. There are five stages in this process that we believe lead to reliable results when routinely applied to an expression dataset, once it has been appropriately generated and collected: (1) biological problem definition and design selection; (2) data examination, “preprocessing,” and reexamination; (3) data analysis step I: screening for differentially expressed genes; (4) data analysis step II: verifying differential expression; and (5) biological verification, interpretation, and communication.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fisher, R. A. (1925) Statistical Methods for Research Workers. Oliver & Boyd, Edinburgh.
Fisher, R. A. (1926) The arrangement of field experiments. J. Minis. Agric. 33, 503–513.
Yates, F. (1937) The Design and Analysis of Factorial Experiments. Technical Communication No. 35. Imperial Bureau of Soil Science, Harpenden, Hertfordshire, UK.
Eisen, M. B., Spellman, P. T., Brown, P. O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868.
Jackson, J. E. (1980) Principal components and factor analysis: part I—principal components. J. Qual. Technol. 12, 201–213.
Wold, S., Albano, C., Dunn, W. J., et al. (1984) Multivariate data analysis in chemistry, in: Chemometrics: Mathematics and Statistics in Chemistry (Kowalski, B. R., ed.), D. Reidel, Dordrecht.
Smyth, G. K. and Speed, T. (2003) Normalization of cDNA microarray data. Methods 31, 265–273.
Lin, Y., Nadler, S. T., Attie, A. D., and Yandell, B. S. (2001) Mining for low-abundance transcripts in microarray data. Department of Statistics Technical Report #1031, University of Wisconsin, Madison, WI.
Dudoit, S., Yang, Y. H., Callow, M. J., and Speed, T. P. (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat. Sin. 12, 111–140.
Draper, N. and Smith, H. (1981) Applied Regression Analysis, 2nd ed. Wiley, New York.
Albano, C., Dunn, W. J. III, Edlund, U., et al. (1978) Four levels of pattern recognition. Anal. Chim. Acta 103, 429–443.
Beebe, K. R., Pell, R. J., and Seasholtz, M. B. (1998) Chemometrics: A Practical Guide. Wiley, New York.
Hsu, J. C. Multiple Comparisons. Chapman and Hall, London.
Wetherill, G. B. Intermediate Statistical Methods (1981) Chapman and Hall, London, UK.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Humana Press Inc., Totowa, NJ
About this protocol
Cite this protocol
Nelson, P.R., Goulter, A.B., Davis, R.J. (2005). Effective Analysis of Genomic Data. In: Read, S.J., Virley, D. (eds) Stroke Genomics. Methods in Molecular Medicine, vol 104. Humana Press. https://doi.org/10.1385/1-59259-836-6:285
Download citation
DOI: https://doi.org/10.1385/1-59259-836-6:285
Publisher Name: Humana Press
Print ISBN: 978-1-58829-333-6
Online ISBN: 978-1-59259-836-6
eBook Packages: Springer Protocols