Skip to main content

Species Sampling Priors for Modeling Dependence: An Application to the Detection of Chromosomal Aberrations

  • Chapter
Nonparametric Bayesian Inference in Biostatistics

Abstract

We discuss a class of Bayesian nonparametric priors that can be used to model local dependence in a sequence of observations. Many popular Bayesian nonparametric priors can be characterized in terms of exchangeable species sampling sequences. However, in some applications, common exchangeability assumptions may not be appropriate. We discuss a generalization of species sampling sequences, where the weights in the predictive probability functions are allowed to depend on a sequence of independent (not necessarily identically distributed) latent random variables. More specifically, we consider conditionally identically distributed (CID) Pitman-Yor sequences and the Beta-GOS sequences recently introduced by Airoldi et al. (Journal of the American Statistical Association, 109, 1466–1480, 2014). We show how those processes can be used as a prior distribution in a hierarchical Bayes modeling framework, and, in particular, how the Beta-GOS can provide a reasonable alternative to the use of non-homogenous Hidden Markov models, further allowing unsupervised clustering of the observations in an unknown number of states. The usefulness of the approach in biostatistical applications is discussed and explicitly shown for the detection of chromosomal aberrations in breast cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Airoldi, E., Costa, T., Bassetti, F., Leisen, F., and Guindani, M. (2014). Generalized Species Sampling Priors With Latent Beta Reinforcements. Journal of the American Statistical Association, 109, 1466–1480.

    Article  MathSciNet  Google Scholar 

  • Airoldi, E. M., Anderson, A., Fienberg, S., and Skinner, K. (2006). Who wrote Ronald Reagan’s radio addresses? Bayesian Anal., 1, 289–320.

    Article  MathSciNet  Google Scholar 

  • Baladandayuthapani, V., Ji, Y., Talluri, R., Nieto-Barajas, L. E., and Morris, J. S. (2010). Bayesian random segmentation models to identify shared copy number aberrations for array cgh data. Journal of the American Statistical Association, 105(492), 1358–1375.

    Article  MathSciNet  MATH  Google Scholar 

  • Bassetti, F., Crimaldi, I., and Leisen, F. (2010). Conditionally identically distributed species sampling sequences. Adv. in Appl. Probab, 42, 433–459.

    Article  MathSciNet  MATH  Google Scholar 

  • Berti, P., Pratelli, L., and P., R. (2004). Limit Theorems for a Class of Identically Distributed Random Variables. Ann. Probab., 32(3), 2029–2052.

    Google Scholar 

  • Blackwell, D. and MacQueen, J. (1973). Ferguson distributions via Pólya urn schemes. Ann. Statist., 1(353–355).

    Google Scholar 

  • Blei, D. and Frazier, P. (2011). Distance dependent Chinese restaurant processes. Journal of Machine Learning Reseach, 12, 2461–2488.

    MathSciNet  Google Scholar 

  • Cardin, N., Holmes, C., Consortium, T. W. T. C. C., Donnelly, P., and Marchini, J. (2011). Bayesian hierarchical mixture modeling to assign copy number from a targeted cnv array. Genetic Epidemiology, 35(6), 536–548.

    Google Scholar 

  • Chin, K., DeVries, S., Fridlyand, J., Spellman, P. T., Roydasgupta, R., Kuo, W.-L., Lapuk, A., Neve, R. M., Qian, Z., Ryder, T., Chen, F., Feiler, H., Tokuyasu, T., Kingsley, C., Dairkee, S., Meng, Z., Chew, K., Pinkel, D., Jain, A., Ljung, B. M., Esserman, L., Albertson, D. G., Waldman, F. M., and Gray, J. W. (2006). Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell, 10(6), 529–541.

    Article  MATH  Google Scholar 

  • DeSantis, S. M., Houseman, E. A., Coull, B. A., Louis, D. N., Mohapatra, G., and Betensky, R. A. (2009). A latent class model with hidden markov dependence for array cgh data. Biometrics, 65(4), 1296–1305.

    Article  MathSciNet  MATH  Google Scholar 

  • Dewar, M., Wiggins, C., and Wood, F. (2012). Inference in Hidden Markov Models with Explicit State Duration Distributions. Signal Processing Letters, IEEE, 19(4), 235–238.

    Article  Google Scholar 

  • Du, L., Chen, M., Lucas, J., and Carlin, L. (2010). Sticky hidden Markov modelling of comparative genomic hybridization. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 58(10), 5353–5368.

    Article  MathSciNet  Google Scholar 

  • Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90, 577–588.

    Article  MathSciNet  MATH  Google Scholar 

  • Ferguson, J. D. (1980). Variable duration models for speech. In Proceedings of the Symposium on the Applications of Hidden Markov Models to Text and Speech, pages 143–179.

    Google Scholar 

  • Fortini, S., Ladelli, L., and Regazzini, E. (2000). Exchangeability, predictive distributions and parametric models. Sankhya, 62(1), 86–109.

    MathSciNet  MATH  Google Scholar 

  • Fox, E., Sudderth, E., Jordan, M., and Willsky, A. (2011). A sticky HDP-HMM with application to speaker diarization. Annals of Applied Statistics, 5(2A), 1020–1056.

    Article  MathSciNet  MATH  Google Scholar 

  • Guha, S., Li, Y., and Neuberg, D. (2008). Bayesian hidden Markov modelling of array cgh data. JASA, 103, 485–497.

    Article  MathSciNet  Google Scholar 

  • Guindani, M., Müller, P., and Zhang, S. (2009). A Bayesian discovery procedure. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(5), 905–925.

    Article  MathSciNet  Google Scholar 

  • Hansen, B. and Pitman, J. (2000). Prediction rules for exchangeable sequences related to species sampling. Statist. Probab. Lett., 46(251–256).

    Google Scholar 

  • Heller, R., Stanley, D., Yekutieli, D., Rubin, N., and Benjamini, Y. (2006). Cluster-based analysis of fMRI data. Neuroimage, 33, 599–608.

    Google Scholar 

  • Hilbe, J. M. (2011). Negative Binomial Regression. Cambridge University Press.

    Google Scholar 

  • Ishwaran, H. and Zarepour, M. (2003). Random probability measures via Pólya sequences: revisiting the Blackwell-MacQueen urn scheme. Technical report, Arxiv.org.

    Google Scholar 

  • Ji, Y., Lu, Y., and Mills, G. (2008). Bayesian models based on test statistics for multiple hypothesis testing problems. Bioinformatics, 24, 943–949.

    Article  Google Scholar 

  • Johnson, M. J. and Willsky, A. S. (2013). Bayesian nonparametric hidden semi-Markov models. J. Mach. Learn. Res., 14(1), 673–701.

    MathSciNet  MATH  Google Scholar 

  • Kim, S., Tadesse, M. G., and Vannucci, M. (2006). Variable selection in clustering via dirichlet process mixture models. Biometrika, 93(4), 877–893.

    Article  MathSciNet  Google Scholar 

  • Lee, J., Quintana, F., Müller, P., and Trippa, L. (2008). Defining Predictive Probability Functions for Species Sampling Models.. Statist.Sci., 2(209–222).

    Google Scholar 

  • Lee, J., Müller, P., Zhu, Y., and Ji, Y. (2013). A nonparametric Bayesian model for local clustering with application to proteomics. Journal of the American Statistical Association, 108(503), 775–788.

    Article  MathSciNet  Google Scholar 

  • Lo, A. (1984). On a class of Bayesian nonparametric estimates: I density estimates. Ann. Statist., 12 (1), 351–357.

    Article  MathSciNet  MATH  Google Scholar 

  • MacEachern, S. N. (1999). Dependent nonparametric processes. In Proceedings of the Section on Bayesian Statistical Science.

    Google Scholar 

  • MacEachern, S. N. and Müller, P. (1998). Estimating mixtures of Dirichlet process models. Journal of Computational and Graphical Statistics, 7, 223–238.

    Google Scholar 

  • Marioni, J. C., Thorne, N. P., and Tavaré, S. (2006). Biohmm: a heterogeneous hidden Markov model for segmenting array CGH data. Bioinformatics, 22(9), 1144–1146.

    Article  Google Scholar 

  • Mitchell, C., Harper, M., and Jamieson, L. (1995). On the complexity of explicit duration hmm’s. Speech and Audio Processing, IEEE Transactions on, 3(3), 213–217.

    Article  Google Scholar 

  • Müller, P. and Quintana, F. (2010). Random partition models with regression on covariates. Journal of Statistical Planning and Inference, 140(10), 2801–2808.

    Article  MathSciNet  Google Scholar 

  • Müller, P., Parmigiani, G., and Rice, K. (2007). FDR and Bayesian multiple comparisons rules. In J. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith, and M. West, editors, Bayesian Statistics 8. Oxford, UK: Oxford University Press.

    Google Scholar 

  • Neal, R. M. (2000). Markov Chain Sampling Methods for Dirichlet Process Mixture Models. Journal of Computational and Graphical Statistics, 9, 249–265.

    MathSciNet  Google Scholar 

  • Newton, M. A., Noueiry, A., Sarkar, D., and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics, 5, 155—176.

    Article  MATH  Google Scholar 

  • Park, J. and Dunson, D. (2010). Bayesian generalized product partition model. Statistica Sinica, 20(1203–1226).

    Google Scholar 

  • Pitman, J. (1996). Some developments of the Blackwell-MacQueen urn scheme, volume 30, pages 245–267. Lecture Notes-Monograph Series, Institute of Mathematical Statistics, Hayward, California.

    Google Scholar 

  • Pitman, J. (2006). Combinatorial Stochastic Processes. Lecture Notes in Mathematics. Springer:Berlin / Heidelberg.

    Google Scholar 

  • Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

    Google Scholar 

  • Redon, R., Fitzgerald, T., and Carter, N. (2009). Comparative genomic hybridization: DNA labeling, hybridization and detection. In M. Dufva, editor, DNA Microarrays for Biomedical Research, volume 529 of Methods in Molecular Biology, pages 267–278. Humana Press.

    Google Scholar 

  • Storey, J. D. (2003). The positive false discovery rate: a Bayesian interpretation and the q-value. The Annals of Statistics, 31, 2013–2035.

    Article  MathSciNet  MATH  Google Scholar 

  • Storey, J. D. (2007). The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. Biostatistics, 8, 414–432.

    Article  Google Scholar 

  • Sun, W., Reich, B. J., Tony Cai, T., Guindani, M., and Schwartzman, A. (2015). False discovery control in large-scale spatial multiple testing. Journal of the Royal Statistical Society Series B, 77, 59–83.

    Article  Google Scholar 

  • Taramasco, O. and Bauer, S. (2012). RHMM: Hidden Markov models simulations and estimations. Technical report, CRAN.

    Google Scholar 

  • Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476), 1566–1581.

    Article  MathSciNet  MATH  Google Scholar 

  • Yau, C., Papaspiliopoulos, O., Roberts, G. O., and Holmes, C. (2011). Bayesian non-parametric hidden Markov models with applications in genomics. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(1), 37–57.

    Article  MathSciNet  Google Scholar 

  • Yu, S.-Z. (2010). Hidden semi-markov models. Artificial Intelligence, 174(2), 215–243. Special Review Issue.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michele Guindani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Bassetti, F., Leisen, F., Airoldi, E., Guindani, M. (2015). Species Sampling Priors for Modeling Dependence: An Application to the Detection of Chromosomal Aberrations. In: Mitra, R., Müller, P. (eds) Nonparametric Bayesian Inference in Biostatistics. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-19518-6_5

Download citation

Publish with us

Policies and ethics