Skip to main content
Log in

A Two-Stage Hidden Markov Model Design for Biomarker Detection, with Application to Microbiome Research

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

It has been recognized that for appropriately ordered data, hidden Markov models (HMM) with local false discovery rate (FDR) control can increase the power to detect significant associations. For many high-throughput technologies, the cost still limits their application. Two-stage designs are attractive, in which a set of interesting features or biomarkers is identified in a first stage and then followed up in a second stage. However, to our knowledge, no two-stage FDR control with HMMs has been developed. In this paper, we study an efficient HMM–FDR-based two-stage design, using a simple integrated analysis procedure across the stages. Numeric studies show its excellent performance when compared to available methods. A power analysis method is also proposed. We use examples from microbiome data to illustrate the methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Zehetmayer S, Bauer P, Posch M (2005) Two-stage designs for experiments with a large number of hypotheses. Bioinformatics 21:3771–3777

    Article  Google Scholar 

  2. Tickle TL, Segata N, Waldron L, Weingart U, Huttenhower C (2013) Two-stage microbial community experimental design. ISME J 7:2330–9

    Article  Google Scholar 

  3. Breslow NE, Cain KC (1988) Logistic regression for two-stage case-control data. Biometrika 71:11–20

    Article  MathSciNet  MATH  Google Scholar 

  4. Haneuse S, Schildcrout J, Gillen D (2012) A two-stage strategy to accommodate general patterns of confounding in the design of observational studies. Biostatistics 13:274–88

    Article  Google Scholar 

  5. Goll A, Bauer P (2007) Two-stage designs applying methods differing in costs. Bioinformatics 23:1519–26

    Article  Google Scholar 

  6. Kraft P, Cox DG (2008) Study designs for genome-wide association studies. Adv Genet 60:465–504

    Google Scholar 

  7. Stanhope SA, Skol AD (2012) Improved minimum cost and maximum power two stage genome-wide association study designs. PLoS One 7:e42367

    Article  Google Scholar 

  8. Simon-Sanchez J et al (2009) Genome-wide association study reveals genetic risk underlying Parkinson’s disease. Nat Genet 41(12):1308–1312

    Article  Google Scholar 

  9. McCarthy MI, Hirschhorn JN (2008) Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet 17(R2):R156–R165

    Article  Google Scholar 

  10. Skol AD, Scott LJ, Abecasis GR, Boehnke M (2006) Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 38:209–13

    Article  Google Scholar 

  11. Zehetmayer S, Bauer P, Posch M (2005) Two-stage designs for experiments with a large number of hypotheses. Bioinformatics 21:3771–7

    Article  Google Scholar 

  12. Sarkar S, Chen J, Guo W (2013) Multiple testing in a two-stage adaptive design with combination tests controlling FDR. J Am Stat Assoc 108:1385–1401

    Article  MathSciNet  MATH  Google Scholar 

  13. Sun W, Tony Cai T (2009) Large-scale multiple testing under dependence. J R Stat Soc 71:393–424

    Article  MathSciNet  MATH  Google Scholar 

  14. Efron B, Storey J, Tibshirani R (2001) Microarrays empirical Bayes methods, and false discovery rates

  15. Lehmann EL (1986) Testing statistical hypotheses. Wiley, New York

    Book  MATH  Google Scholar 

  16. Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29(4):1165–1188

    Article  MathSciNet  MATH  Google Scholar 

  17. Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat, 795–800

  18. Efron B (2004) Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J Am Stat Assoc 99:96–104

    Article  MathSciNet  MATH  Google Scholar 

  19. Guan Z, Wu B, Zhao H (2008) Nonparametric estimator of false discovery rate based on Bernstein polynomials. Stat Sin 18:905–923

    MathSciNet  MATH  Google Scholar 

  20. Strimmer K (2008) A unified approach to false discovery rate estimation. BMC Bioinf 9:303

    Article  Google Scholar 

  21. Guedj M, Robin S, Celisse A, Nuel G (2009) Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation. BMC Bioinf 10:84

    Article  Google Scholar 

  22. Rüschendorf L (1982) Random variables with maximum sums. Adv Appl Probab 14:623–632

    Article  MathSciNet  MATH  Google Scholar 

  23. Human Microbiome Project Consortium (2012) Structure, function and diversity of the healthy human microbiome. Nature 486(7402): 207–214

  24. Markle JG et al (2013) Sex differences in the gut microbiome drive hormone-dependent regulation of autoimmunity. Science 339(6123):1084–1088

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by R21HG007840.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi-Hui Zhou.

Additional information

Yi-Hui Zhou and Xiaoshan Wang have contributed equally to this work.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 687 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, YH., Brooks, P. & Wang, X. A Two-Stage Hidden Markov Model Design for Biomarker Detection, with Application to Microbiome Research. Stat Biosci 10, 41–58 (2018). https://doi.org/10.1007/s12561-017-9187-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-017-9187-y

Keywords

Navigation