Abstract
Detecting differentially expressed genes is difficult due to the large number of genes simultaneously tested, resulting in low power for each test after adjusting for multiplicity. We propose a novel adaptive filtering procedure that improves power by filtering out genes that are unlikely to be differentially expressed. We show that the proposed procedure controls the false discovery rate asymptotically. Simulation study further demonstrate its advantage over the state-of-the-art competitors.
References
Benidt, S., & Nettleton, D. (2015). Simseq: A nonparametric approach to simulation of rna-sequence datasets. Bioinformatics, 31(13), 2131–2140.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300.
Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.
Blanchard, G., & Roquain, E. (2009). Adaptive false discovery rate control under independence and dependence. Journal of Machine Learning Research, 10, 2837–2871.
Bourgon, R., Gentleman, R., & Huber, W. (2010). Independent filtering increases detection power for high-throughput experiments. Proceedings of the National Academy of Sciences, 107(21), 9546–9551.
Chiaretti, S., Li, X., Gentleman, R., Vitale, A., Vignetti, M., Mandelli, F., et al. (2004). Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood, 103(7), 2771–2778.
Dudoit, S., Shaffer, J. P., & Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments. Statistical Science, 18, 71–103.
Farcomeni, A., & Finos, L. (2013). FDR control with pseudo-gatekeeping based on a possibly data driven order of the hypotheses. Biometrics, 69(3), 606–613.
Gilbert, P. B. (2005). A modified false discovery rate multiple-comparisons procedure for discrete data, applied to human immunodeficiency virus genetics. Journal of the Royal Statistical Society: Series C, 54(1), 143–158.
Hackstadt, A. J., & Hess, A. M. (2009). Filtering for increased power for microarray data analysis. BMC Bioinformatics, 10(1), 11.
Ignatiadis, N., Klaus, B., Zaugg, J. B., & Huber, W. (2016). Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nature Methods, 13(7), 577–580.
Kerr, M. K., Martin, M., & Churchill, G. A. (2000). Analysis of variance for gene expression microarray data. Journal of Computational Biology, 7(6), 819–837.
Liang, K., & Keleş, S. (2012). Detecting differential binding of transcription factors with ChIP-seq. Bioinformatics, 28(1), 121–122.
Liang, K., & Nettleton, D. (2010). A hidden Markov model approach to testing multiple hypotheses on a tree-transformed gene ontology graph. Journal of the American Statistical Association, 105(492), 1444–1454.
Liang, K., & Nettleton, D. (2012). Adaptive and dynamic adaptive procedures for false discovery rate control and estimation. Journal of the Royal Statistical Society: Series B, 74(1), 163–182.
Nettleton, D., Recknor, J., & Reecy, J. M. (2008). Identification of differentially expressed gene categories in microarray studies using nonparametric multivariate analysis. Bioinformatics, 24(2), 192–201.
Scholtens, D., & Von Heydebreck, A. (2005). Analysis of differential gene expression studies. In Bioinformatics and computational biology solutions using R and Bioconductor (pp. 229–248). New York: Springer.
Smyth, G. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3(1), 1–26.
Storey, J. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B, 64(3), 479–498.
Strimmer, K. (2008). Fdrtool: A versatile R package for estimating local and tail area-based false discovery rates. Bioinformatics, 24(12), 1461–1462.
Acknowledgements
Kun Liang is supported by Canada NSERC grant 435666-2013.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Nie, Z., Liang, K. (2017). Adaptive Filtering Increases Power to Detect Differentially Expressed Genes. In: Chen, DG., Jin, Z., Li, G., Li, Y., Liu, A., Zhao, Y. (eds) New Advances in Statistics and Data Science. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-69416-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-69416-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69415-3
Online ISBN: 978-3-319-69416-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)