A Two-Stage Hidden Markov Model Design for Biomarker Detection, with Application to Microbiome Research

Zhou, Yi-Hui; Brooks, Paul; Wang, Xiaoshan

doi:10.1007/s12561-017-9187-y

A Two-Stage Hidden Markov Model Design for Biomarker Detection, with Application to Microbiome Research

Published: 10 February 2017

Volume 10, pages 41–58, (2018)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

Yi-Hui Zhou¹,
Paul Brooks² &
Xiaoshan Wang³

373 Accesses
4 Citations
9 Altmetric
1 Mention
Explore all metrics

Abstract

It has been recognized that for appropriately ordered data, hidden Markov models (HMM) with local false discovery rate (FDR) control can increase the power to detect significant associations. For many high-throughput technologies, the cost still limits their application. Two-stage designs are attractive, in which a set of interesting features or biomarkers is identified in a first stage and then followed up in a second stage. However, to our knowledge, no two-stage FDR control with HMMs has been developed. In this paper, we study an efficient HMM–FDR-based two-stage design, using a simple integrated analysis procedure across the stages. Numeric studies show its excellent performance when compared to available methods. A power analysis method is also proposed. We use examples from microbiome data to illustrate the methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A powerful microbiome-based association test and a microbial taxa discovery framework for comprehensive association mapping

Article Open access 24 April 2017

Statistical Methods for Feature Identification in Microbiome Studies

Powerful and robust non-parametric association testing for microbiome data via a zero-inflated quantile approach (ZINQ)

Article Open access 02 September 2021

References

Zehetmayer S, Bauer P, Posch M (2005) Two-stage designs for experiments with a large number of hypotheses. Bioinformatics 21:3771–3777
Article Google Scholar
Tickle TL, Segata N, Waldron L, Weingart U, Huttenhower C (2013) Two-stage microbial community experimental design. ISME J 7:2330–9
Article Google Scholar
Breslow NE, Cain KC (1988) Logistic regression for two-stage case-control data. Biometrika 71:11–20
Article MathSciNet MATH Google Scholar
Haneuse S, Schildcrout J, Gillen D (2012) A two-stage strategy to accommodate general patterns of confounding in the design of observational studies. Biostatistics 13:274–88
Article Google Scholar
Goll A, Bauer P (2007) Two-stage designs applying methods differing in costs. Bioinformatics 23:1519–26
Article Google Scholar
Kraft P, Cox DG (2008) Study designs for genome-wide association studies. Adv Genet 60:465–504
Google Scholar
Stanhope SA, Skol AD (2012) Improved minimum cost and maximum power two stage genome-wide association study designs. PLoS One 7:e42367
Article Google Scholar
Simon-Sanchez J et al (2009) Genome-wide association study reveals genetic risk underlying Parkinson’s disease. Nat Genet 41(12):1308–1312
Article Google Scholar
McCarthy MI, Hirschhorn JN (2008) Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet 17(R2):R156–R165
Article Google Scholar
Skol AD, Scott LJ, Abecasis GR, Boehnke M (2006) Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 38:209–13
Article Google Scholar
Zehetmayer S, Bauer P, Posch M (2005) Two-stage designs for experiments with a large number of hypotheses. Bioinformatics 21:3771–7
Article Google Scholar
Sarkar S, Chen J, Guo W (2013) Multiple testing in a two-stage adaptive design with combination tests controlling FDR. J Am Stat Assoc 108:1385–1401
Article MathSciNet MATH Google Scholar
Sun W, Tony Cai T (2009) Large-scale multiple testing under dependence. J R Stat Soc 71:393–424
Article MathSciNet MATH Google Scholar
Efron B, Storey J, Tibshirani R (2001) Microarrays empirical Bayes methods, and false discovery rates
Lehmann EL (1986) Testing statistical hypotheses. Wiley, New York
Book MATH Google Scholar
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29(4):1165–1188
Article MathSciNet MATH Google Scholar
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat, 795–800
Efron B (2004) Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J Am Stat Assoc 99:96–104
Article MathSciNet MATH Google Scholar
Guan Z, Wu B, Zhao H (2008) Nonparametric estimator of false discovery rate based on Bernstein polynomials. Stat Sin 18:905–923
MathSciNet MATH Google Scholar
Strimmer K (2008) A unified approach to false discovery rate estimation. BMC Bioinf 9:303
Article Google Scholar
Guedj M, Robin S, Celisse A, Nuel G (2009) Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation. BMC Bioinf 10:84
Article Google Scholar
Rüschendorf L (1982) Random variables with maximum sums. Adv Appl Probab 14:623–632
Article MathSciNet MATH Google Scholar
Human Microbiome Project Consortium (2012) Structure, function and diversity of the healthy human microbiome. Nature 486(7402): 207–214
Markle JG et al (2013) Sex differences in the gut microbiome drive hormone-dependent regulation of autoimmunity. Science 339(6123):1084–1088
Article Google Scholar

Download references

Acknowledgements

This work was supported by R21HG007840.

Author information

Authors and Affiliations

Department of Biological Sciences, Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
Yi-Hui Zhou
Department of Statistical Sciences and Operations Research and Department of Supply Chain Management and Analytics, Virginia Commonwealth University, Richmond, VA, USA
Paul Brooks
IMEDACS, LLC, Ann Arbor, MI, USA
Xiaoshan Wang

Authors

Yi-Hui Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Paul Brooks
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoshan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi-Hui Zhou.

Additional information

Yi-Hui Zhou and Xiaoshan Wang have contributed equally to this work.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 687 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, YH., Brooks, P. & Wang, X. A Two-Stage Hidden Markov Model Design for Biomarker Detection, with Application to Microbiome Research. Stat Biosci 10, 41–58 (2018). https://doi.org/10.1007/s12561-017-9187-y

Download citation

Received: 05 January 2016
Revised: 27 October 2016
Accepted: 20 January 2017
Published: 10 February 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s12561-017-9187-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Two-Stage Hidden Markov Model Design for Biomarker Detection, with Application to Microbiome Research

Abstract

Access this article

Similar content being viewed by others

A powerful microbiome-based association test and a microbial taxa discovery framework for comprehensive association mapping

Statistical Methods for Feature Identification in Microbiome Studies

Powerful and robust non-parametric association testing for microbiome data via a zero-inflated quantile approach (ZINQ)

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 687 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Two-Stage Hidden Markov Model Design for Biomarker Detection, with Application to Microbiome Research

Abstract

Access this article

Similar content being viewed by others

A powerful microbiome-based association test and a microbial taxa discovery framework for comprehensive association mapping

Statistical Methods for Feature Identification in Microbiome Studies

Powerful and robust non-parametric association testing for microbiome data via a zero-inflated quantile approach (ZINQ)

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 687 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation