Bayesian Clustering for HIV1 Protease Inhibitor Contact Maps

  • Sandhya PrabhakaranEmail author
  • Julia E. Vogt
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11526)


We present a probabilistic model for clustering which enables the modeling of overlapping clusters where objects are only available as pairwise distances. Examples of such distance data are genomic string alignments, or protein contact maps. In our clustering model, an object has the freedom to belong to one or more clusters at the same time. By using an IBP process prior, there is no need to explicitly fix the number of clusters, as well as the number of overlapping clusters, in advance. In this paper, we demonstrate the utility of our model using distance data obtained from HIV1 protease inhibitor contact maps.


Bayesian nonparametrics Clustering Medical informatics 


  1. 1.
    Achenbach, C.J., Darin, K.M., Murphy, R.L., Christine, K.: Atazanavir/ritonavir-based combination antiretroviral therapy for treatment of HIV-1 infection in adults. Future Virol. 6(2), 157–177 (2011)CrossRefGoogle Scholar
  2. 2.
    Berman, H.M., et al.: The protein data bank. Nucleic Acids Res. 28, 235–242 (2000)CrossRefGoogle Scholar
  3. 3.
    Bernardino, J.I., Arribas, J.R.: Antiviral therapy. Infect. Dis. 4, 918–926 (2011)Google Scholar
  4. 4.
    Griffiths, T.L., Ghahramani, Z.: Infinite latent feature models and the Indian buffet process, May 2005Google Scholar
  5. 5.
    Heller, K.A., Ghaharamani, Z.: A nonparametric Bayesian approach to modeling overlapping clusters. In: AISTATS (2007)Google Scholar
  6. 6.
    Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications. Texts in Computer Science. Springer, New York (2008). Scholar
  7. 7.
    Zhengtong, L., Chu, Y., Wang, Y.: HIV protease inhibitors: a review of molecular selectivity and toxicity. HIV AIDS (Auckl.) 7, 95 (2015)Google Scholar
  8. 8.
    Schölkopf, B., Smola, A.J., et al.: Learning with kernels: support vector machines, regularization, optimization, and beyond (2002)Google Scholar
  9. 9.
    Streich, A.P., Frank, M., Buhmann, J.M.: Multi-assignment clustering for Boolean data. In: ICML (2009)Google Scholar
  10. 10.
    Vitányi, P.M.B., Balbach, F.J., Cilibrasi, R.L., Li, M.: Normalized information distance. In: Emmert-Streib, F., Dehmer, M. (eds.) Information Theory and Statistical Learning, pp. 45–82. Springer, Boston (2009). Scholar
  11. 11.
    Vogt, J.E., Prabhakaran, S., Fuchs, T.J., Roth, V.: The translation-invariant Wishart-Dirichlet process for clustering distance data. In: ICML, pp. 1111–1118 (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Computational and Systems Biology ProgramMemorial Sloan Kettering Cancer CenterNew YorkUSA
  2. 2.Department of Computer ScienceETH ZurichZurichSwitzerland

Personalised recommendations