Skip to main content

Microarray Data Normalization and Robust Detection of Rhythmic Features

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1986))

Abstract

Data derived from microarray technologies are generally subject to various sources of noise and accordingly the raw data are pre-processed before formally analysed. Data normalization is a key pre-processing step when dealing with microarray experiments, such as circadian gene-expressions, since it removes systematic variations across arrays. A wide variety of normalization methods are available in the literature. However, from our experience in the study of rhythmic expression patterns in oscillatory systems (e.g. cell-cycle, circadian clock), the choice of the normalization method may substantially impair the identification of rhythmic genes. Hence, the identification of a gene as rhythmic could be just as an artefact of how the data were normalized. Yet, gene rhythmicity detection is crucial in modern toxicological and pharmacological studies, thus a procedure to truly identify rhythmic genes that are robust to the choice of a normalization method is required.

To perform the task of detecting rhythmic features, we propose a rhythmicity measure based on bootstrap methodology to robustly identify rhythmic genes in oscillatory systems. Although our methodology can be extended to any high-throughput experiment, in this chapter, we illustrate how to apply it to a publicly available circadian clock microarray gene-expression data and give full details (both statistical and computational) so that the methodology can be used in an easy way. We will show that the choice of normalization method has very little effect on the proposed methodology since the results derived from the bootstrap-based rhythmicity measure are highly rank correlated for any pair of normalization methods considered. This suggests, on the one hand, that the rhythmicity measure proposed is robust to the choice of the normalization method, and on the other hand, that gene rhythmicity detected using this measure is potentially not a mere artefact of the normalization method used. In this way the researcher using this methodology will be protected against the possible effect of different normalizations, as the conclusions obtained will not depend so strongly on them. Additionally, the described bootstrap methodology can also be employed as a tool to simulate gene-expression participating in an oscillatory system from a reference data set.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Tu Y, Stolovitzky G, Klein U (2002) Quantitative noise analysis for gene-expression microarray experiments. Proc Natl Acad Sci USA 99: 14031–14036

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Klebanov L, Yakovlev A (2007) How high is the level of technical noise in microarray data? Biol Direct 2: 9. https://doi.org/10.1186/1745-6150-2-9

    Article  PubMed  PubMed Central  Google Scholar 

  3. Bolstad BM, Irizarry RA, Ȧstrand M et al (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19: 185–193

    Article  CAS  PubMed  Google Scholar 

  4. Irizarry RA, Bolstad BM, Collin F et al (2003) Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 31: e15. https://doi.org/10.1093/nar/gng015

    Article  PubMed  PubMed Central  Google Scholar 

  5. Li C, Wong WH (2001) Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci USA 98: 31–36

    Article  CAS  PubMed  Google Scholar 

  6. Hubbell E, Liu WM, Mei R (2002) Robust estimators for expression analysis. Bioinformatics 18: 1585–1592

    Article  CAS  PubMed  Google Scholar 

  7. Liu G, Loraine AE, Shigeta R et al (2003) NetAffx: Affymetrix probesets and annotations. Nucleic Acids Res 31: 82–86

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Irizarry RA, Hobbs B, Collin F et al (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4: 249–264

    Article  PubMed  Google Scholar 

  9. Wu Z (2009) A review of statistical methods for preprocessing oligonucleotide microarrays. Stat Methods Med Res 18: 533–541

    Article  PubMed  PubMed Central  Google Scholar 

  10. Cheng L, Lo LY, Tang NLS et al (2016) CrossNorm: a novel normalization strategy for microarray data in cancers. Sci Rep 6: 18898. https://doi.org/10.1038/srep18898

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Astrand M (2003) Contrast normalization of oligonucleotide arrays. J Comput Biol 10: 95–102

    Article  CAS  PubMed  Google Scholar 

  12. Workman C, Jensen LJ, Jarmer H et al (2002) A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biol 3: research0048.1–research0048.16. https://doi.org/10.1186/gb-2002-3-9-research0048

    Article  Google Scholar 

  13. Huber W, Von Heydebreck A, Sültmann H et al (2002) Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18: 96–104

    Article  Google Scholar 

  14. Larriba Y, Rueda C, Fernández MA et al (2018) A bootstrap based measure robust to the choice of normalization methods for detecting rhythmic features in high dimensional data. Front Genet 9: 24. https://doi.org/10.3389/fgene.2018.00024

    Article  PubMed  PubMed Central  Google Scholar 

  15. Slavov N, Airoldi EM, Van Oudenaarden A et al (2012) A conserved cell growth cycle can account for the environmental stress responses of divergent eukaryotes. Mol Biol Cell 23: 1986–1997

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Oliva A, Rosebrock A, Ferrezuelo F et al (2005) The cell cycle-regulated genes of Schizosaccharomyces pombe. PLoS Biol 3: 1239–1260

    Article  Google Scholar 

  17. Peng X, Karuturi RKM, Miller LD et al (2005) Identification of cell cycle-regulated genes in fission yeast. Mol Biol Cell 16: 1026–1042

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Rustici G, Mata J, Kivinen K et al (2004) Periodic gene expression program of the fission yeast cell cycle. Nat Genet 36: 809–817

    Article  CAS  PubMed  Google Scholar 

  19. Barragán S, Fernández MA, Rueda C et al (2015) Determination of temporal order among the components of an oscillatory system. PLoS One 10: e0124842. https://doi.org/10.1371/journal.pone.0124842

    Article  PubMed  PubMed Central  Google Scholar 

  20. Hughes ME, DiTacchio L, Hayes KR (2009) Harmonics of circadian gene transcription in mammals. PLoS Genet 5: e1000442. https://doi.org/10.1371/journal.pgen.1000442

    Article  PubMed  PubMed Central  Google Scholar 

  21. Larriba Y, Rueda C, Fernández MA et al (2016) Order restricted inference for oscillatory systems for detecting rhythmic genes. Nucleic Acids Res 44: e163. https://doi.org/10.1093/nar/gkw771

    Article  PubMed  PubMed Central  Google Scholar 

  22. Levine JD, Funes P, Dowse HB et al (2002) Signal analysis of behavioral and molecular cycles. BMC Neurosci 3: 1. https://doi.org/10.1186/1471-2202-3-1

    Article  PubMed  PubMed Central  Google Scholar 

  23. Straume M (2004) DNA microarray time series analysis: automated statistical assessment of circadian rhythms in gene expression patterning. Methods Enzymol 383: 149–166

    Article  CAS  PubMed  Google Scholar 

  24. Hughes ME, Hogenesch JB, Kornacker K (2010) Jtk-cycle: an efficient nonparametric algorithm for detecting rhythmic components in genome-scale data sets. J Biol Rhythm 25: 372–380

    Article  Google Scholar 

  25. Thaben PF, Westermark PO (2014) Detecting rhythms in time series with rain. J Biol Rhythm 29: 391–400

    Article  Google Scholar 

  26. Robertson T, Wright FT, Dykstra RL (1988) Order restricted statistical inference. Wiley, New York

    Google Scholar 

  27. Fernández MA, Rueda C, Peddada SD (2012) Identification of a core set of signature cell cycle genes whose relative order of time to peak expression is conserved across species. Nucleic Acids Res 40: 2823–2832

    Article  PubMed  Google Scholar 

  28. Peddada SD, Umbach DM, Harris S (2012) Statistical analysis of gene expression studies with ordered experimental conditions. Handbook of statistics. Elsevier, Amsterdam

    Google Scholar 

  29. Barragán S, Fernández MA, Rueda C et al (2013) isocir: an r package for constrained inference using isotonic regression for circular data, with an application to cell biology. J Stat Sotw 54: i04. https://doi.org/10.18637/jss.v054.i04

  30. Suárez MB, Alonso-Nuñez ML, del Rey F et al (2015) Regulation of ace2-dependent genes requires components of the PBF complex in Schizosaccharomyces pombe. Cell Cycle 14: 3124–3137

    Article  PubMed  PubMed Central  Google Scholar 

  31. Rueda C, Fernández MA, Barragán S et al (2016) Circular piecewise regression with applications to cell-cycle data. Biometrics 72: 1266–1274

    Article  PubMed  PubMed Central  Google Scholar 

  32. Barragán S, Fernández MA, Rueda C (2017) Circular order aggregation and its application to cell-cycle genes expressions. Bioinformatics 14: 819–829

    Google Scholar 

  33. Freudenberg J, Boriss H, Hasenclever D (2004) Comparison of preprocessing procedures for oligo-nucleotide micro-arrays by parametric bootstrap simulation of spike-in experiments. Methods Inform Med 43: 434–438

    Article  CAS  Google Scholar 

  34. Nykter M, Aho T, Ahdesmäki M et al (2006) Simulation of microarray data with realistic characteristics. BMC Bioinformatics 7: 349. https://doi.org/10.1186/1471-2105-7-349

    Article  PubMed  PubMed Central  Google Scholar 

  35. Parrish RS, Spencer III HJ, Xu P (2009) Distribution modeling and simulation of gene expression data. Comput Stat Data Anal 53: 1650–1660

    Article  Google Scholar 

  36. Dembélé D (2013) A flexible microarray data simulation model. Microarrays 44: 115–130

    Article  Google Scholar 

  37. Nagoshi E, Saini C, Bauer C et al (2004) Circadian gene expression in individual fibroblasts: Cell-autonomous and self-sustained oscillators pass time to daughter cells. Cell 119: 693–705

    Article  CAS  PubMed  Google Scholar 

  38. Baggs JE, Price TS, DiTacchio L et al (2009) Network features of the mammalian circadian clock. PLoS Biol 7: 0563–0575

    Article  Google Scholar 

  39. Niforou KM, Anagnostopoulos AK, Vougas K et al (2008) The proteome profile of the human osteosarcoma u2os cell line. Cancer Genomics Proteomics 5: 63–77

    CAS  PubMed  Google Scholar 

  40. Gautier L, Cope L, Bolstad BM et al (2004) Affy - analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20: 307–315

    Article  CAS  PubMed  Google Scholar 

  41. Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5: 299–314

    Google Scholar 

  42. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol 57: 289–300

    Google Scholar 

  43. Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. Chapman & Hall/CRC, Boca Raton

    Google Scholar 

  44. Emerson JD, Hoaglin DC (1983) Analysis of two-way tables by medians. Understanding robust and exploratory data analysis. Wiley, New York

    Google Scholar 

  45. Pizarro A, Hayer K, Lahens NF et al (2013) Circadb: a database of mammalian circadian gene expression profiles. Nucleic Acids Res 41: D1009–D1013. https://doi.org/10.1093/nar/gks1161

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yolanda Larriba .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Larriba, Y., Rueda, C., Fernández, M.A., Peddada, S.D. (2019). Microarray Data Normalization and Robust Detection of Rhythmic Features. In: Bolón-Canedo, V., Alonso-Betanzos, A. (eds) Microarray Bioinformatics. Methods in Molecular Biology, vol 1986. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9442-7_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-9442-7_9

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-4939-9441-0

  • Online ISBN: 978-1-4939-9442-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics