Sample Size and Power Calculation for Molecular Biology Studies

  • Sin-Ho Jung
Part of the Methods in Molecular Biology book series (MIMB, volume 620)


Sample size calculation is a critical procedure when designing a new biological study. In this chapter, we consider molecular biology studies generating huge dimensional data. Microarray studies are typical examples, so that we state this chapter in terms of gene microarray data, but the discussed methods can be used for design and analysis of any molecular biology studies involving high-dimensional data. In this chapter, we discuss sample size calculation methods for molecular biology studies when the discovery of prognostic molecular markers is performed by accurately controlling false discovery rate (FDR) or family-wise error rate (FWER) in the final data analysis. We limit our discussion to the two-sample case.

Key words

False discovery rate family-wise error rate prognostic gene true rejection two-sample t-test 


  1. 1.
    Benjamini, Y., Hochberg, Y. (1995) Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 57(1), 289–300.Google Scholar
  2. 2.
    Genovese, C., Wasserman, L. (2002) Operating characteristics and extensions of the false discovery rate procedure. Journal of the Royal Statistical Society, Series B 64(3), 499–517.CrossRefGoogle Scholar
  3. 3.
    Dudoit, S., Shaffer, J.P., Boldrick, J.C. (2003) Multiple hypothesis testing in microarray experiments. Statistical Science 18, 71–103.CrossRefGoogle Scholar
  4. 4.
    Jung, S.H. (2005) Sample size for FDR-control in microarray data analysis. Bioinformatics 21, 3097–3103.PubMedCrossRefGoogle Scholar
  5. 5.
    Pounds, S., Cheng, C. (2005) Sample size determination for the false discovery rate. Bioinformatics 21, 4263–4271.PubMedCrossRefGoogle Scholar
  6. 6.
    Liu, P., Hwang, J.T.G. (2007) Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23, 739–746.PubMedCrossRefGoogle Scholar
  7. 7.
    Storey, J.D. (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B 64(1), 479–498.Google Scholar
  8. 8.
    Storey, J.D. (2003) The positive false discovery rate: a Bayesian interpretation and the q-value. Annals of Statistics 31(6), 2013–2035.CrossRefGoogle Scholar
  9. 9.
    Storey, J.D., Tibshirani, R. (2001) Estimating false discovery rates under dependence, with applications to DNA microarrays. Technical Report 2001–2028, Department of Statistics, Stanford University.Google Scholar
  10. 10.
    Storey, J.D., Taylor, J.E., Siegmund, D. (2004) Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society, Series B 66(1), 187–205.CrossRefGoogle Scholar
  11. 11.
    Lee, M.L.T., Whitmore, G.A. (2002) Power and sample size for DNA microarray studies. Statistics in Medicine 21, 3543–3570.PubMedCrossRefGoogle Scholar
  12. 12.
    van den Oord, E.J.C.G., Sullivan, P.F. (2003) A framework for controlling false discovery rates and minimizing the amount of genotyping in gene-finding studies. Human Heredity 56(4), 188–199.PubMedCrossRefGoogle Scholar
  13. 13.
    Jung, S.H., Jang, W. (2006) How accurately can we control the FDR in analyzing microarray data? Bioinformatics 22, 1730–1736.PubMedCrossRefGoogle Scholar
  14. 14.
    Holm, S. (1979) A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statististics 6, 65–70.Google Scholar
  15. 15.
    Hochberg, Y. (1998) A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75, 800–802.CrossRefGoogle Scholar
  16. 16.
    Westfall, P.H., Young, S.S. (1989) P-value adjustments for multiple tests in multivariate binomial models. Journal of the American Statistical Association 84, 780–786.Google Scholar
  17. 17.
    Westfall, P.H., Young, S.S. (1993) Resampling-based Multiple Testing: Examples and Methods for P-value Adjustment. Wiley: New York.Google Scholar
  18. 18.
    Westfall, P.H., Wolfinger, R.D. (1997) Multiple tests with discrete distributions. American Statistician 51, 3–8.Google Scholar
  19. 19.
    Dudoit, S., Yang, Y.H., Callow, M.J., Speed, T.P. (2000) Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica 12, 111–139.Google Scholar
  20. 20.
    Ge, Y., Dudoit, S., Speed, T.P. (2003) Resampling-based multiple testing for microarray data analysis. Test 12(1), 1–44.CrossRefGoogle Scholar
  21. 21.
    Jung, S.H., Bang, H., Young, S.S. (2005) Sample size calculation for multiple testing in microarray data analysis. Biostatics 6(1), 157–169.CrossRefGoogle Scholar
  22. 22.
    Witte, J.S., Elston, R.C., Cardon, L.R. (2000) On the relative sample size required for multiple comparisons. Statistics in Medicine 19, 369–372.PubMedCrossRefGoogle Scholar
  23. 23.
    Wolfinger, R.D., Gibson, G., Wolfinger, E.D., Bennett, L., Hamadeh, H., Bushel, P., Afshari, C., Paules, R.S. (2001) Assessing gene significance from cDNA microarray expression data via mixed models. Journal of Computational Biology 8(6), 625–637.PubMedCrossRefGoogle Scholar
  24. 24.
    Black, M.A., Doerge, R.W. (2002) Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments. Bioinformatics 18(12), 1609–1616.PubMedCrossRefGoogle Scholar
  25. 25.
    Pan, W., Lin, J., Le, C.T. (2002) How many replicated of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biology 3(5), 1–10.CrossRefGoogle Scholar
  26. 26.
    Cui, X., Churchill, G.A. (2003) How many mice and how many arrays? Replication in mouse cDNA microarray experiments. In Methods of Microarray Data Analysis II. Kluwer Academic Publishers: Norwell, MA, 139–154.Google Scholar
  27. 27.
    Lin, D.Y. (2005) An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics 21, 781–787.PubMedCrossRefGoogle Scholar
  28. 28.
    Huang, E., Cheng, S.H., Dressman, H., Pittman, J., Tsou, M.H., Horng, C.F., Bild, A., Iversen, E.S., Liao, M., Chen, C.M., West, M., Nevins, J.R., Huang, A.T. (2003) Gene expression predictors of breast cancer outcomes. Lancet, 361, 1590–1596.PubMedCrossRefGoogle Scholar
  29. 29.
    Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics, 4(2), 249–264.PubMedCrossRefGoogle Scholar
  30. 30.
    Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P. (2003) A comparison of normalization methods for high density oligonucleotide array data based on bias and variance. Biostatistics 19(2), 185–193.Google Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Sin-Ho Jung
    • 1
  1. 1.Department of Biostatistics and BioinformaticsDuke UniversityDurhamUSA

Personalised recommendations