It has been recognized that for appropriately ordered data, hidden Markov models (HMM) with local false discovery rate (FDR) control can increase the power to detect significant associations. For many high-throughput technologies, the cost still limits their application. Two-stage designs are attractive, in which a set of interesting features or biomarkers is identified in a first stage and then followed up in a second stage. However, to our knowledge, no two-stage FDR control with HMMs has been developed. In this paper, we study an efficient HMM–FDR-based two-stage design, using a simple integrated analysis procedure across the stages. Numeric studies show its excellent performance when compared to available methods. A power analysis method is also proposed. We use examples from microbiome data to illustrate the methods.
Biomarker False discovery rates Hidden Markov model Metagenomics Metatranscriptomics PCR
This is a preview of subscription content, log in to check access.
Haneuse S, Schildcrout J, Gillen D (2012) A two-stage strategy to accommodate general patterns of confounding in the design of observational studies. Biostatistics 13:274–88CrossRefGoogle Scholar
Goll A, Bauer P (2007) Two-stage designs applying methods differing in costs. Bioinformatics 23:1519–26CrossRefGoogle Scholar
Kraft P, Cox DG (2008) Study designs for genome-wide association studies. Adv Genet 60:465–504Google Scholar
Stanhope SA, Skol AD (2012) Improved minimum cost and maximum power two stage genome-wide association study designs. PLoS One 7:e42367CrossRefGoogle Scholar
Simon-Sanchez J et al (2009) Genome-wide association study reveals genetic risk underlying Parkinson’s disease. Nat Genet 41(12):1308–1312CrossRefGoogle Scholar
McCarthy MI, Hirschhorn JN (2008) Genome-wide association studies: potential next steps on a genetic journey. Hum Mol Genet 17(R2):R156–R165CrossRefGoogle Scholar
Skol AD, Scott LJ, Abecasis GR, Boehnke M (2006) Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat Genet 38:209–13CrossRefGoogle Scholar
Zehetmayer S, Bauer P, Posch M (2005) Two-stage designs for experiments with a large number of hypotheses. Bioinformatics 21:3771–7CrossRefGoogle Scholar