Abstract
It has been recognized that for appropriately ordered data, hidden Markov models (HMM) with local false discovery rate (FDR) control can increase the power to detect significant associations. For many high-throughput technologies, the cost still limits their application. Two-stage designs are attractive, in which a set of interesting features or biomarkers is identified in a first stage and then followed up in a second stage. However, to our knowledge, no two-stage FDR control with HMMs has been developed. In this paper, we study an efficient HMM–FDR-based two-stage design, using a simple integrated analysis procedure across the stages. Numeric studies show its excellent performance when compared to available methods. A power analysis method is also proposed. We use examples from microbiome data to illustrate the methods.
Similar content being viewed by others
References
Zehetmayer S, Bauer P, Posch M (2005) Two-stage designs for experiments with a large number of hypotheses. Bioinformatics 21:3771–3777
Tickle TL, Segata N, Waldron L, Weingart U, Huttenhower C (2013) Two-stage microbial community experimental design. ISME J 7:2330–9
Breslow NE, Cain KC (1988) Logistic regression for two-stage case-control data. Biometrika 71:11–20
Haneuse S, Schildcrout J, Gillen D (2012) A two-stage strategy to accommodate general patterns of confounding in the design of observational studies. Biostatistics 13:274–88
Goll A, Bauer P (2007) Two-stage designs applying methods differing in costs. Bioinformatics 23:1519–26
Kraft P, Cox DG (2008) Study designs for genome-wide association studies. Adv Genet 60:465–504
Stanhope SA, Skol AD (2012) Improved minimum cost and maximum power two stage genome-wide association study designs. PLoS One 7:e42367
Simon-Sanchez J et al (2009) Genome-wide association study reveals genetic risk underlying Parkinson’s disease. Nat Genet 41(12):1308–1312
McCarthy MI, Hirschhorn JN (2008) Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet 17(R2):R156–R165
Skol AD, Scott LJ, Abecasis GR, Boehnke M (2006) Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 38:209–13
Zehetmayer S, Bauer P, Posch M (2005) Two-stage designs for experiments with a large number of hypotheses. Bioinformatics 21:3771–7
Sarkar S, Chen J, Guo W (2013) Multiple testing in a two-stage adaptive design with combination tests controlling FDR. J Am Stat Assoc 108:1385–1401
Sun W, Tony Cai T (2009) Large-scale multiple testing under dependence. J R Stat Soc 71:393–424
Efron B, Storey J, Tibshirani R (2001) Microarrays empirical Bayes methods, and false discovery rates
Lehmann EL (1986) Testing statistical hypotheses. Wiley, New York
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29(4):1165–1188
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat, 795–800
Efron B (2004) Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J Am Stat Assoc 99:96–104
Guan Z, Wu B, Zhao H (2008) Nonparametric estimator of false discovery rate based on Bernstein polynomials. Stat Sin 18:905–923
Strimmer K (2008) A unified approach to false discovery rate estimation. BMC Bioinf 9:303
Guedj M, Robin S, Celisse A, Nuel G (2009) Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation. BMC Bioinf 10:84
Rüschendorf L (1982) Random variables with maximum sums. Adv Appl Probab 14:623–632
Human Microbiome Project Consortium (2012) Structure, function and diversity of the healthy human microbiome. Nature 486(7402): 207–214
Markle JG et al (2013) Sex differences in the gut microbiome drive hormone-dependent regulation of autoimmunity. Science 339(6123):1084–1088
Acknowledgements
This work was supported by R21HG007840.
Author information
Authors and Affiliations
Corresponding author
Additional information
Yi-Hui Zhou and Xiaoshan Wang have contributed equally to this work.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhou, YH., Brooks, P. & Wang, X. A Two-Stage Hidden Markov Model Design for Biomarker Detection, with Application to Microbiome Research. Stat Biosci 10, 41–58 (2018). https://doi.org/10.1007/s12561-017-9187-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-017-9187-y