Abstract
Most of the scientific journals require published microarray experiments to meet Minimum Information About a Microarray Experiment (MIAME) standards. This ensures that other researchers have the necessary information to interpret the results or reproduce them. Required MIAME information includes raw experimental data, processed data, and data processing procedures. However, the normalization method is often reported inaccurately or not at all. It may be that the scaling factor is not even known except to experienced users of the normalization software. We propose that using a seeded clustering algorithm, researchers can identify or verify previously unknown or doubtful normalization information. For that, we generate descriptive statistics (mean, variance, quantiles, and moments) for normalized expression data from gene chip experiments available in the ArrayExpress database and cluster chips based on these statistics. To verify that clustering grouped chips by normalization method, we normalize raw data for chips chosen from experiments in ArrayExpress using multiple methods. We then generate the same descriptive statistics for the normalized data and cluster the chips using these statistics. We use this dataset of known pedigree as seeding data to identify normalization methods used in unknown or doubtful situations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gautier, L., Cope, L., Bolstad, B., Irizarry, R.: Affy–analysis of Affymetrix GeneChip data at the probelevel. Bioinformatics. 20, 307–215 (2004)
Irizarry, L., Hobbs, B., Collin, F., Speed, T.P.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 4, 240–264 (2003)
Lim, W.K., Wang, K., Lefebvre, C., Califano, A.: Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks. Bioinformatics. 13, 282–288 (2007)
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Boston (2005)
Pevsner, J.: Bioinformatics and Functional Genomics. Wiley-Blackwell, New Jersey (2009)
Härdle, W., Simar, L.: Applied Multivariate Statistical Analysis. Springer, Berlin (2003)
R Development Core Team.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria (2008)
Maechler, M., Rouseeuw, P., Struyf, A., Hubert, M.: Cluster Analysis Basics and Extensions. Unpublished (2005)
Suzuki, R., Shimodaira, H.: pvclust: Hierarchical Clustering with p-Values via Multiscale Boostrapp Resampling. R Package Version 1.2-1Â (2009)
Wickham, H.: ggplot2: An Implementation of the Grammar of Graphics. R Package Version 0.8.3. http://had.co.nz/ggplot2/Â (2009)
Bolsted, B., Irizarry, R., Åstrand, M., Speed, T.: A comparison of normalization methods for high density oglinucleotide data based on variance and bias. Bioinformatics. 19, 185–193 (2003)
David, H., Nagaraja, H.: Order Statistics. Wiley, New York (2003)
Sitter, R., Wu, C.: A note on Woodruff confidence intervals for quantiles. Statistics & Probability Letters. 52, 353–358 (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Vendettuoli, M., Doyle, E., Hofmann, H. (2011). Clustering Microarray Data to Determine Normalization Method. In: Arabnia, H., Tran, QN. (eds) Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology, vol 696. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7046-6_15
Download citation
DOI: https://doi.org/10.1007/978-1-4419-7046-6_15
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7045-9
Online ISBN: 978-1-4419-7046-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)