Skip to main content

Promises and Pitfalls of High-Throughput Biological Assays

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1415))

Abstract

This chapter discusses some of the pitfalls encountered when performing biomedical research involving high-throughput “omics” data and presents some strategies and guidelines that researchers should follow when undertaking such studies. We discuss common errors in experimental design and data analysis that lead to irreproducible and non-replicable research and provide some guidelines to avoid these common mistakes so that researchers may have confidence in study outcomes, even if the results are negative. We discuss the importance of ranking and prespecifying hypotheses, performing power analysis, careful experimental design, and preplanning of statistical analyses in order to avoid the “fishing expedition” data analysis strategy, which is doomed to fail. The impact of multiple testing on false-positive rates is discussed, particularly in the context of the analysis of high-throughput data, and methods to correct for it are presented, as well as approaches to detect and correct for experimental biases and batch effects, which often plague high-throughput assays. We highlight the importance of sharing data and analysis code to facilitate reproducibility and present tools and software that are appropriate for this purpose.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Jager LR, Leek JT (2014) An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics 15:1–12

    Article  PubMed  Google Scholar 

  2. Ioannidis JPA (2005) Why most published research findings are false. PLoS Med 2:e124

    Article  PubMed  PubMed Central  Google Scholar 

  3. Easterbrook PJ, Berlin JA, Gopalan R, Matthews DR (1991) Publication bias in clinical research. Lancet 337:867–872

    Article  CAS  PubMed  Google Scholar 

  4. Goodman S, Greenland S (2007) Why most published research findings are false: problems in the analysis. PLoS Med 4:e168

    Article  PubMed  PubMed Central  Google Scholar 

  5. von Elm E, Egger M (2004) The scandal of poor epidemiological research. BMJ 329:868–869

    Article  Google Scholar 

  6. Humphrey LL, Chan BKS, Sox HC (2002) Postmenopausal hormone replacement therapy and the primary prevention of cardiovascular disease. Ann Intern Med 137:273–284

    Article  CAS  PubMed  Google Scholar 

  7. Pocock SJ, Collier TJ, Dandreo KJ, de Stavola BL, Goldman MB, Kalish LA et al (2004) Issues in the reporting of epidemiological studies: a survey of recent practice. BMJ 329:883

    Article  PubMed  PubMed Central  Google Scholar 

  8. Hutson S (2010) Data handling errors spur debate over clinical trial. Nat Med 16:618

    Article  CAS  PubMed  Google Scholar 

  9. Baggerly KA, Coombes KR (2011) What information should be required to support clinical “omics” publications? Clin Chem 57:688–690

    Article  CAS  PubMed  Google Scholar 

  10. Peng RD (2011) Reproducible research in computational science. Science 334:1226–1227

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Ioannidis JPA, Allison DB, Ball CA, Coulibaly I, Cui X, Culhane AC et al (2009) Repeatability of published microarray gene expression analyses. Nat Genet 41:149–155

    Article  CAS  PubMed  Google Scholar 

  12. Macarthur D (2012) Methods: face up to false positives. Nature 487:427–428

    Article  CAS  PubMed  Google Scholar 

  13. Sebastiani P, Solovieff N, Puca A, Hartley SW, Melista E, Andersen S et al (2011) Retraction. Science 333:404

    Article  CAS  PubMed  Google Scholar 

  14. Hunt KA, Smyth DJ, Balschun T, Ban M, Mistry V, Ahmad T et al (2012) Rare and functional SIAE variants are not associated with autoimmune disease risk in up to 66,924 individuals of European ancestry. Nat Genet 44:3–5

    Article  CAS  Google Scholar 

  15. Peng RD (2009) Reproducible research and biostatistics. Biostatistics 10:405–408

    Article  PubMed  Google Scholar 

  16. McNutt M (2014) Journals unite for reproducibility. Science 346:679

    Article  CAS  PubMed  Google Scholar 

  17. Principles and Guidelines for Reporting Preclinical Research—About NIH—National Institutes of Health (NIH) [Internet]. [cited 10 Sep 2015]. http://www.nih.gov/about/reporting-preclinical-research.htm

  18. Noble WS (2009) How does multiple testing correction work? Nat Biotechnol 27:1135–1137

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Storey JD (2002) A direct approach to false discovery rates. J R Stat Soc B Stat Methodol 64:479–498

    Article  Google Scholar 

  20. Hochberg Y (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75:800–802

    Article  Google Scholar 

  21. Hommel G (1988) A stagewise rejective multiple test procedure based on a modified Bonferroni test. Biometrika 75:383–386

    Article  Google Scholar 

  22. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B Stat Methodol 57:289–300

    Google Scholar 

  23. Yekutieli D, Benjamini Y (1999) Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J Stat Plan Infer 82:171–196

    Article  Google Scholar 

  24. Holm S (1979) A simple sequentially rejective multiple test procedure. Scand Stat Theory Appl 6:65–70

    Google Scholar 

  25. Dudoit S, Shaffer JP, Boldrick JC (2003) Multiple hypothesis testing in microarray experiments. Stat Sci 18:71–103

    Article  Google Scholar 

  26. Shaffer JP (1995) Multiple hypothesis testing. Annu Rev Psychol 46:561–584

    Article  Google Scholar 

  27. Sarkar SK (1998) Some probability inequalities for ordered MTP2 random variables: a proof of the Simes conjecture. Ann Stat 26:494–504

    Article  Google Scholar 

  28. Sarkar SK, Chang C-K (1997) The Simes method for multiple hypothesis testing with positively dependent test statistics. J Am Stat Assoc 92:1601–1608

    Article  Google Scholar 

  29. Wright SP (1992) Adjusted P-values for simultaneous inference. Biometrics 48:1005–1013

    Article  Google Scholar 

  30. Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5:299–314

    Google Scholar 

  31. Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE et al (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11:733–739

    Article  CAS  PubMed  Google Scholar 

  32. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C et al (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29:365–371

    Article  CAS  PubMed  Google Scholar 

  33. Lee JA, Spidlen J, Boyce K, Cai J, Crosbie N, Dalphin M et al (2008) MIFlowCyt: the minimum information about a flow cytometry experiment. Cytometry A 73:926–930

    Article  PubMed  PubMed Central  Google Scholar 

  34. The Functional Genomics Data Society. Minimum Information about a high-throughput SeQuencing Experiment—MINSEQE (Draft Proposal) [Internet]. http://www.mged.org/minseqe/

  35. Thomas L, Krebs CJ (1997) A review of statistical power analysis software. Bull Ecol Soc Am 78:126–138

    Google Scholar 

  36. Champely S (2009) pwr: basic functions for power analysis. R package version 1.1. 1. The R Foundation, Vienna, Austria

    Google Scholar 

  37. Scherer A (2009) Sources and solutions. Wiley, Chichester

    Google Scholar 

  38. Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8:118–127

    Article  PubMed  Google Scholar 

  39. Chen C, Grennan K, Badner J, Zhang D, Gershon E, Jin L et al (2011) Removing batch effects in analysis of expression microarray data: an evaluation of six batch adjustment methods. PLoS One 6:e17238

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Hahne F, Khodabakhshi AH, Bashashati A, Wong C-J, Gascoyne RD, Weng AP et al (2010) Per-channel basis normalization methods for flow cytometry data. Cytometry A 77:121–131

    PubMed  PubMed Central  Google Scholar 

  41. Finak G, Jiang W, Krouse K, Wei C, Sanz I, Phippard D et al (2014) High-throughput flow cytometry data normalization for clinical trials. Cytometry A 85:277–286

    Article  PubMed  PubMed Central  Google Scholar 

  42. Jones DC, Ruzzo WL, Peng X, Katze MG (2012) A new approach to bias correction in RNA-Seq. Bioinformatics 28:921–928

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Buonaccorsi JP (2009) Models, methods, and applications. Chapman & Hall/CRC, New York

    Google Scholar 

  44. Maecker HT, Rinfret A, D’Souza P, Darden J, Roig E, Landry C et al (2005) Standardization of cytokine flow cytometry assays. BMC Immunol 6:13

    Article  PubMed  PubMed Central  Google Scholar 

  45. Huang Y, Moodie Z, Li S, Self SG (2012) Comparing and combining assay measurements across laboratories via integration of paired-sample data to correct for measurement error. Stat Med 31(28):3748–3759

    Article  PubMed  PubMed Central  Google Scholar 

  46. Bland JM, Altman DG (1986) Statistical methods for assessing agreement between two methods of clinical measurement. Report No.: 0140-6736 (Print)r0140-6736 (Linking). pp 307–310

    Google Scholar 

  47. Dudoit S, Yang YH, Callow MJ, Speed TP (2002) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Stat Sin 12:111–140

    Google Scholar 

  48. Maecker HT, McCoy JP, Nussenblatt R (2012) Standardizing immunophenotyping for the Human Immunology Project. Nat Rev Immunol 12:191–200

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Finak G, Langweiler M, Malekesmaeili M, Stanton R, Ramey J, Jaimes M et al (2014) Standardizing flow cytometry immunophenotyping: automated gating recapitulates central manual analysis with low variability. Cyto 2014. p Parallel Session 17–Flow Cytometry Data Analysis

    Google Scholar 

  50. Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26:139–140

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Anders S, Huber W (2012) Differential expression of RNA-Seq data at the gene level—the DESeq package

    Google Scholar 

  52. McDavid A, Finak G, Chattopadyay PK, Dominguez M, Lamoreaux L, Ma SS et al (2013) Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics 29:461–467

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. McDavid A, Dennis L, Danaher P, Finak G, Krouse M, Wang A et al (2014) Modeling bi-modality improves characterization of cell cycle on gene expression in single cells. PLoS Comput Biol 10:e1003696

    Article  PubMed  PubMed Central  Google Scholar 

  54. Hicks SC, Teng M, Irizarry RA (2015) On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data. bioRxiv. http://dx.doi.org/10.1101/025528

  55. Finak G, McDavid A, Yajima M, Deng J, Gersuk V, Shalek AK et al (2015) MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA-seq data. Genome Biol 16:278

    Article  PubMed  PubMed Central  Google Scholar 

  56. Shalek AK, Satija R, Shuga J, Trombetta JJ, Gennert D, Lu D et al (2014) Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature 510:263–269

    Article  Google Scholar 

  57. Imholte GC, Sauteraud R, Korber B, Bailer RT, Turk ET, Shen X et al (2013) A computational framework for the analysis of peptide microarray antibody binding data with application to HIV vaccine profiling. J Immunol Methods 395:1–13

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Finak G, Frelinger J, Jiang W, Newell EW, Ramey J, Davis MM et al (2014) OpenCyto: an open source infrastructure for scalable, robust, reproducible, and automated, end-to-end flow cytometry data analysis. PLoS Comput Biol 10:e1003806

    Article  PubMed  PubMed Central  Google Scholar 

  59. Hahne F, LeMeur N, Brinkman RR, Ellis B, Haaland P, Sarkar D et al (2009) flowCore: a Bioconductor package for high throughput flow cytometry. BMC Bioinformatics 10:106

    Article  PubMed  PubMed Central  Google Scholar 

  60. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80

    Article  PubMed  PubMed Central  Google Scholar 

  61. Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A et al (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25:1422–1423

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C et al (2002) The Bioperl toolkit: Perl modules for the life sciences. Genome Res 12:1611–1618

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP (2006) GenePattern 2.0. Nat Genet 38:500–501

    Article  CAS  PubMed  Google Scholar 

  64. Spidlen J, Barsky A, Breuer K, Carr P, Nazaire M-D, Hill BA et al (2013) GenePattern flow cytometry suite. Source Code Biol Med 8:14

    Article  PubMed  PubMed Central  Google Scholar 

  65. Nelson EK, Piehler B, Eckels J, Rauch A, Bellew M, Hussey P et al (2011) LabKey Server: an open source platform for scientific data integration, analysis and collaboration. BMC Bioinformatics 12:71

    Article  PubMed  PubMed Central  Google Scholar 

  66. Brusic V, Gottardo R, Kleinstein SH, Davis MM, HIPC Steering Committee (2014) Computational resources for high-dimensional immune analysis from the Human Immunology Project Consortium. Nat Biotechnol 32:146–148

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Di Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C (2015) The impact of Docker containers on the performance of genomic pipelines. PeerJ 3:e1428

    Article  Google Scholar 

  68. Boettiger C (2014) An introduction to Docker for reproducible research, with examples from the R environment. arXiv [cs.SE]

    Google Scholar 

  69. Mesirov JP (2010) Accessible reproducible research. Science 327:415–416

    Article  CAS  PubMed  Google Scholar 

  70. Leisch F (2002) Sweave, Part I: Mixing R and LaTeX. R News 2:28–31

    Google Scholar 

  71. Gentleman R, Lang DT (2004) Statistical analyses and reproducible research. Available at: http://biostats.bepress.com/bioconductor/paper2/

  72. Allaire J, Cheng J, Xie Y, McPherson J, Chang W, Allen J et al (2015) rmarkdown: dynamic documents for R. R package version 0.5

    Google Scholar 

Download references

Acknowledgments

This work was supported by a Bill and Melinda Gates Foundation grant, the Vaccine Immunology Statistical Center, and NIH grants U01 AI068635-01, U19 AI089986-01, and R01 EB00840-08.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raphael Gottardo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Finak, G., Gottardo, R. (2016). Promises and Pitfalls of High-Throughput Biological Assays. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 1415. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3572-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3572-7_12

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3570-3

  • Online ISBN: 978-1-4939-3572-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics