Discovery Among Binary Biomarkers in Heterogeneous Populations

Geng, Junxian; Slate, Elizabeth H.

doi:10.1007/978-3-030-33416-1_11

Junxian Geng⁸ &
Elizabeth H. Slate⁹

Part of the book series: Emerging Topics in Statistics and Biostatistics ((ETSB))

1061 Accesses
1 Citations

Abstract

Biomarkers have great potential to improve disease diagnosis and treatment. Disease may arise via multiple pathways, however, each associated with distinct complex interactions among multiple biomarkers, and hence patients exhibit considerable heterogeneity in the biomarker-disease association despite sharing the same clinical diagnosis. Thus identification of clinically useful biomarker combinations requires statistical methods that accommodate population heterogeneity and enable discovery of possibly complex interactions among biomarkers that associate with disease. We address jointly modeling binary and continuous disease outcomes when the association between predictors and these outcomes exhibits heterogeneity. In the context of binary biomarkers, we use ideas from logic regression to find Boolean combinations of these biomarkers that predict the binary disease outcome. The associated continuous outcome is modeled as Gaussian. Heterogeneity is cast as unknown subgroups in the population, with the associations between the joint outcome and biomarkers and other covariates varying by subgroup. We adopt a mixture of finite mixtures (MFM) fully Bayesian formulation to simultaneously estimate the number of subgroups, the subgroup membership structure, and the subgroup-specific relationships between outcomes and predictors. We describe how our model incorporates the Boolean relations as parameters arising from the MFM model and our approach to the associated challenges of specifying the prior distribution and estimation using Markov chain Monte Carlo. We illustrate the performance of the methods using simulation and discuss application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 99.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Henceforth we refer to the classes defining the underlying subpopulation structure as clusters for greater consistency with the machine learning and Bayesian literature. The cluster configuration is the cluster assignment information encoded by the {z _i}. Because each individual is assigned to exactly one cluster, the cluster configuration is, equivalently, a partition of the n observations into K groups.

References

Aldous, D. J. (1985). Exchangeability and related topics. Berlin: Springer.
Book Google Scholar
Allman, E. S., Matias, C., & Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. The Annals of Statistics, 3099–3132.
Google Scholar
Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 1152–1174.
Google Scholar
Blackwell, D. & MacQueen, J. B. (1973). Ferguson distributions via Pólya urn schemes. The Annals of Statistics, 353–355.
Google Scholar
Chipman, H. A., George, E. I., & Mcculloch, R. E. (1998). Bayesian CART model search. Journal of the American Statistical Association, 93(443), 935–960.
Article Google Scholar
Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266–298.
Article MathSciNet Google Scholar
Dahl, D. B. (2006). Model-based clustering for expression data via a Dirichlet process mixture model. Bayesian inference for gene expression and proteomics, 4, 201–218.
Article Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.
Article MathSciNet Google Scholar
Etzioni, R., Falcon, S., Gann, P. H., Kooperberg, C. L., Penson, D. F., & Stampfer, M. J. (2004). Prostate-specific antigen and free prostate-specific antigen in the early detection of prostate cancer: Do combination tests improve detection? Cancer Epidemiology Biomarkers and Prevention, 13(10), 1640–1645.
Google Scholar
Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 209–230.
Google Scholar
Fleisher, H., Tavel, M., & Yeager, J. (1983). Exclusive-OR representation of Boolean functions. IBM Journal of Research and Development, 27(4), 412–416.
Article Google Scholar
Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4), 711–732.
Article MathSciNet Google Scholar
Huang, G.-H., & Bandeen-Roche, K. (2004). Building an identifiable latent class model with covariate effects on underlying and measured variables. Psychometrika, 69(1), 5–32.
Article MathSciNet Google Scholar
Ishwaran, H., & Zarepour, M. (2002). Dirichlet prior sieves in finite normal mixtures. Statistica Sinica, 12, 941–963.
MathSciNet MATH Google Scholar
Janes, H., Pepe, M., Kooperberg, C., & Newcomb, P. (2005). Identifying target populations for screening or not screening using logic regression. Statistics in Medicine, 24(9), 1321–1338.
Article MathSciNet Google Scholar
Kooperberg, C., Bis, J. C., Marciante, K. D., Heckbert, S. R., Lumley, T., & Psaty, B. M. (2007). Logic regression for analysis of the association between genetic variation in the renin-angiotensin system and myocardial infarction or stroke. American Journal of Epidemiology, 165(3), 334–343.
Article Google Scholar
Kooperberg, C., & Ruczinski, I. (2005). Identifying interacting SNPs using Monte Carlo logic regression. Genetic Epidemiology, 28(2), 157–70.
Article Google Scholar
Lo, S. H., & Zhang, T. (2002). Backward haplotype transmission association (BHTA) algorithm – A fast multiple-marker screening method. Human Heredity, 53(4), 197–215.
Article Google Scholar
MacEachern, S. N., & Muller, P. (1998). Estimating mixture of Dirichlet process models. Journal of Computational and Graphical Statistics, 7(2), 223–238.
Google Scholar
Miller, J. W. (2014). Nonparametric and variable-dimension Bayesian mixture models: Analysis, comparison, and new methods. Ph.D. Thesis, Brown University.
Google Scholar
Miller, J. W., & Harrison, M. T. (2018). Mixture models with a prior on the number of components. Journal of the American Statistical Association, 113(521), 340–356.
Article MathSciNet Google Scholar
Mitra, A. P., Datar, R. H., & Cote, R. J. (2006). Molecular pathways in invasive bladder cancer: New insights into mechanisms, progression, and target identification. Journal of Clinical Oncolology, 24(35), 5552–5564.
Article Google Scholar
Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2), 249–265.
MathSciNet Google Scholar
Petrone, S., & Raftery, A. E. (1997). A note on the Dirichlet process prior in Bayesian nonparametric inference with partial exchangeability. Statistics & Probability Letters, 36(1), 69–83.
Article MathSciNet Google Scholar
Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probability Theory and Related Fields, 102(2), 145–158.
Article MathSciNet Google Scholar
Proust-Lima, C., Séne, M., Taylor, J. M., & Jacqmin-Gadda, H. (2014). Joint latent class models for longitudinal and time-to-event data: A review. Statistical Methods in Medical Research, 23(1), 74–90.
Article MathSciNet Google Scholar
Ruczinski, I., Kooperberg, C., & LeBlanc, M. (2003). Logic regression. Journal of Computational and graphical Statistics, 12(3), 475–511.
Article MathSciNet Google Scholar
Schapire, R. E., & Freund, Y. (2012). Boosting: Foundations and Algorithms. Cambridge: The MIT Press.
MATH Google Scholar
Schwender, H., & Ickstadt, K. (2008). Identification of SNP interactions using logic regression. Biostatistics, 9(1), 187–198.
Article Google Scholar
Slate, E. H., Geng, J., Wolf, B. J., & Hill, E. G. (2014). Discovery among binary biomarkers. In JSM Proceedings, WNAR. Alexandria: American Statistical Association.
Google Scholar
Srivastava, S. (2005). Cancer biomarkers: an emerging means of detecting, diagnosing and treating cancer. Cancer Biomarkers, 1(1), 1–2.
Article Google Scholar
Vermeulen, S. H., Den Heijer, M., Sham, P., & Knight, J. (2007). Application of multi-locus analytical methods to identify interacting loci in case-control studies. Annals of Human Genetics, 71, 689–700.
Article Google Scholar

Download references

Acknowledgements

The authors were partially supported by grants R01MH104423, R01HD078410 and R01HD093055 from the National Institutes of Health. Portions of this work were revised while E. Slate was the Visiting Scholar in Honor of David C. Jordan at AbbVie, Inc. in North Chicago, IL and also a Research Fellow with the Statistical and Applied Mathematical Sciences Institute in Durham, NC. Additional support from the Graduate School and Department of Statistics at Florida State University is gratefully acknowledged. Figures 1 and 2 were adapted from a figure provided by Dr. Zhengwu Zhang, Univ. of Rochester. The authors thank the reviewers for comments that led to improvement of this manuscript.

Author information

Authors and Affiliations

Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, USA
Junxian Geng
Department of Statistics, Florida State University, Tallahassee, FL, USA
Elizabeth H. Slate

Authors

Junxian Geng
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth H. Slate
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elizabeth H. Slate .

Editor information

Editors and Affiliations

Math and Statistics, 1342, Georgia State University, Atlanta, GA, USA
Yichuan Zhao
School of Social Work, University of North Carolina, Chapel Hill, NC, USA
Ding-Geng (Din) Chen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Geng, J., Slate, E.H. (2020). Discovery Among Binary Biomarkers in Heterogeneous Populations. In: Zhao, Y., Chen, DG. (eds) Statistical Modeling in Biomedical Research. Emerging Topics in Statistics and Biostatistics . Springer, Cham. https://doi.org/10.1007/978-3-030-33416-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-33416-1_11
Published: 20 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33415-4
Online ISBN: 978-3-030-33416-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics