Clustering Microarray Data to Determine Normalization Method

Vendettuoli, Marie; Doyle, Erin; Hofmann, Heike

doi:10.1007/978-1-4419-7046-6_15

Marie Vendettuoli^3,4,
Erin Doyle &
Heike Hofmann

Part of the book series: Advances in Experimental Medicine and Biology ((AEMB,volume 696))

2661 Accesses
1 Citations

Abstract

Most of the scientific journals require published microarray experiments to meet Minimum Information About a Microarray Experiment (MIAME) standards. This ensures that other researchers have the necessary information to interpret the results or reproduce them. Required MIAME information includes raw experimental data, processed data, and data processing procedures. However, the normalization method is often reported inaccurately or not at all. It may be that the scaling factor is not even known except to experienced users of the normalization software. We propose that using a seeded clustering algorithm, researchers can identify or verify previously unknown or doubtful normalization information. For that, we generate descriptive statistics (mean, variance, quantiles, and moments) for normalized expression data from gene chip experiments available in the ArrayExpress database and cluster chips based on these statistics. To verify that clustering grouped chips by normalization method, we normalize raw data for chips chosen from experiments in ArrayExpress using multiple methods. We then generate the same descriptive statistics for the normalized data and cluster the chips using these statistics. We use this dataset of known pedigree as seeding data to identify normalization methods used in unknown or doubtful situations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gautier, L., Cope, L., Bolstad, B., Irizarry, R.: Affy–analysis of Affymetrix GeneChip data at the probelevel. Bioinformatics. 20, 307–215 (2004)
Article PubMed CAS Google Scholar
Irizarry, L., Hobbs, B., Collin, F., Speed, T.P.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 4, 240–264 (2003)
Article Google Scholar
Lim, W.K., Wang, K., Lefebvre, C., Califano, A.: Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks. Bioinformatics. 13, 282–288 (2007)
Article Google Scholar
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Boston (2005)
Google Scholar
Pevsner, J.: Bioinformatics and Functional Genomics. Wiley-Blackwell, New Jersey (2009)
Book Google Scholar
Härdle, W., Simar, L.: Applied Multivariate Statistical Analysis. Springer, Berlin (2003)
Google Scholar
R Development Core Team.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria (2008)
Google Scholar
Maechler, M., Rouseeuw, P., Struyf, A., Hubert, M.: Cluster Analysis Basics and Extensions. Unpublished (2005)
Google Scholar
Suzuki, R., Shimodaira, H.: pvclust: Hierarchical Clustering with p-Values via Multiscale Boostrapp Resampling. R Package Version 1.2-1 (2009)
Google Scholar
Wickham, H.: ggplot2: An Implementation of the Grammar of Graphics. R Package Version 0.8.3. http://had.co.nz/ggplot2/ (2009)
Bolsted, B., Irizarry, R., Åstrand, M., Speed, T.: A comparison of normalization methods for high density oglinucleotide data based on variance and bias. Bioinformatics. 19, 185–193 (2003)
Article Google Scholar
David, H., Nagaraja, H.: Order Statistics. Wiley, New York (2003)
Book Google Scholar
Sitter, R., Wu, C.: A note on Woodruff confidence intervals for quantiles. Statistics & Probability Letters. 52, 353–358 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA, 50010, USA
Marie Vendettuoli
Department of Statistics, Iowa State University, Ames, IA, 50010, USA
Marie Vendettuoli

Authors

Marie Vendettuoli
View author publications
You can also search for this author in PubMed Google Scholar
Erin Doyle
View author publications
You can also search for this author in PubMed Google Scholar
Heike Hofmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marie Vendettuoli .

Editor information

Editors and Affiliations

Dept. Computer Science, University of Georgia, Athens, 30602-7404, Georgia, USA
Hamid R. Arabnia
, Department of Computer Science, Lamar University, Beaumont, 77710, Texas, USA
Quoc-Nam Tran

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vendettuoli, M., Doyle, E., Hofmann, H. (2011). Clustering Microarray Data to Determine Normalization Method. In: Arabnia, H., Tran, QN. (eds) Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology, vol 696. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7046-6_15

Download citation

DOI: https://doi.org/10.1007/978-1-4419-7046-6_15
Published: 15 March 2011
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7045-9
Online ISBN: 978-1-4419-7046-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics