Skip to main content

Feature Enriched Nonparametric Bayesian Co-clustering

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7301))

Included in the following conference series:

Abstract

Co-clustering has emerged as an important technique for mining relational data, especially when data are sparse and high-dimensional. Co-clustering simultaneously groups the different kinds of objects involved in a relation. Most co-clustering techniques typically only leverage the entries of the given contingency matrix to perform the two-way clustering. As a consequence, they cannot predict the interaction values for new objects. In many applications, though, additional features associated to the objects of interest are available. The Infinite Hidden Relational Model (IHRM) has been proposed to make use of these features. As such, IHRM has the capability to forecast relationships among previously unseen data. The work on IHRM lacks an evaluation of the improvement that can be achieved when leveraging features to make predictions for unseen objects. In this work, we fill this gap and re-interpret IHRM from a co-clustering point of view. We focus on the empirical evaluation of forecasting relationships between previously unseen objects by leveraging object features. The empirical evaluation demonstrates the effectiveness of the feature-enriched approach and identifies the conditions under which the use of features is most useful, i.e., with sparse data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, D., Chen, B.-C.: Regression-based latent factor models. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, pp. 19–28 (2009)

    Google Scholar 

  2. Antoniak, C.E.: Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems. The Annals of Statistics 2(6), 1152–1174 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  3. Balabanovic, M., Shoham, Y.: Fab: content-based, collaborative recommendation. Commun. ACM 40(3), 66–72 (1997)

    Article  Google Scholar 

  4. Blackwell, D., Macqueen, J.B.: Ferguson distributions via Pólya urn schemes. The Annals of Statistics 1, 353–355 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3(4-5), 993–1022 (2003)

    MATH  Google Scholar 

  6. Chen, Y.-H., George, E.I.: A bayesian model for collaborative filtering. In: 7th International Workshop on Artificial Intelligence and Statistics (1999)

    Google Scholar 

  7. Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, pp. 89–98 (2003)

    Google Scholar 

  8. Dunson, D.B., Xue, Y., Carin, L.: The matrix stick-breaking process: Flexible Bayes meta-analysis. Journal of the American Statistical Association 103(481), 317–327 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  9. Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. The Annals of Statistics 1(2), 209–230 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  10. George, T., Merugu, S.: A scalable collaborative filtering framework based on co-clustering. In: Proceedings of the IEEE International Conference on Data Mining, pp. 625–628 (2005)

    Google Scholar 

  11. Hartigan, J.A.: Direct clustering of a data matrix. Journal of the American Statistical Association 67(337), 123–129 (1972)

    Google Scholar 

  12. Hofmann, T.: Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst. 22, 89–115 (2004)

    Article  Google Scholar 

  13. Jacob, L., Hoffmann, B., Stoven, V., Vert, J.-P.: Virtual screening of GPCRs: an in silico chemogenomics approach. BMC Bioinformatics 9(1), 363 (2008)

    Article  Google Scholar 

  14. Jin, R., Si, L., Zhai, C.: A study of mixture models for collaborative filtering. Journal of Information Retrieval 9, 357–382 (2006)

    Article  Google Scholar 

  15. Khoshneshin, M., Street, W.N.: Incremental collaborative filtering via evolutionary co-clustering. In: Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys 2010, pp. 325–328. ACM, New York (2010)

    Chapter  Google Scholar 

  16. Lemire, D., Maclachlan, A.: Slope one predictors for online rating-based collaborative filtering. In: Proceedings of the SIAM Data Mining, SDM (2005)

    Google Scholar 

  17. Lim, Y.J., Teh, Y.W.: Variational Bayesian Approach to Movie Rating Prediction. In: Proceedings of KDD Cup and Workshop (2007)

    Google Scholar 

  18. Marlin, B.: Modeling user rating profiles for collaborative filtering. In: Advances in Neural Information Processing Systems (NIPS), vol. 17 (2003)

    Google Scholar 

  19. Meeds, E., Roweis, S.: Nonparametric Bayesian Biclustering. Technical Report UTML TR 2007-001, Department of Computer Science, University of Toronto (2007)

    Google Scholar 

  20. Neal, R.M.: Markov Chain Sampling Methods for Dirichlet Process Mixture Models. Journal of Computational and Graphical Statistics 9(2), 249–265 (2000)

    MathSciNet  Google Scholar 

  21. Ning, X., Rangwala, H., Karypis, G.: Multi-assay-based structure activity relationship models: Improving structure activity relationship models by incorporating activity information from related targets. Journal of Chemical Information and Modeling 49(11), 2444–2456 (2009); PMID: 19842624

    Article  Google Scholar 

  22. Okuno, Y., Yang, J., Taneishi, K., Yabuuchi, H., Tsujimoto, G.: GLIDA: GPCR-ligand database for chemical genomic drug discovery. Nucleic Acids Research 34(suppl. 1), D673–D677 (2006)

    Article  Google Scholar 

  23. Papaspiliopoulos, O., Roberts, G.O.: Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika 95(1), 169–186 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  24. Pitman, J., Yor, M.: The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability 25(2), 855–900 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  25. Porteous, I., Asuncion, A., Welling, M.: Bayesian matrix factorization with side information and dirichlet process mixtures. In: AAAI (2010)

    Google Scholar 

  26. Salakhyuditnov, R., Mnih, A.: Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo. In: International Conference on Machine Learning (2008)

    Google Scholar 

  27. Schafer, J.B., Konstan, J., Riedi, J.: Recommender systems in e-commerce. In: Proceedings of the ACM Conference on Electronic Commerce, pp. 158–166 (1999)

    Google Scholar 

  28. Sethuraman, J.: A constructive definition of Dirichlet priors. Statistica Sinica 4, 639–650 (1994)

    MathSciNet  MATH  Google Scholar 

  29. Shafiei, M., Milios, E.: Latent Dirichlet co-clustering. In: IEEE International Conference on Data Mining, pp. 542–551 (2006)

    Google Scholar 

  30. Shan, H., Banerjee, A.: Bayesian co-clustering. In: IEEE International Conference on Data Mining (2008)

    Google Scholar 

  31. Shan, H., Banerjee, A.: Generalized probabilistic matrix factorizations for collaborative filtering. In: Proceedings of the IEEE International Conference on Data Mining, pp. 1025–1030 (2010)

    Google Scholar 

  32. Sutskever, I., Salakhutdinov, R., Tenenbaum, J.: Modelling relational data using Bayesian clustered tensor factorization. In: Advances in Neural Information Processing Systems, vol. 22, pp. 1821–1828 (2009)

    Google Scholar 

  33. Symeonidis, P., Nanopoulos, A., Papadopoulos, A., Manolopoulos, Y.: Nearest-Biclusters Collaborative Filtering with Constant Values. In: Nasraoui, O., Spiliopoulou, M., Srivastava, J., Mobasher, B., Masand, B. (eds.) WebKDD 2006. LNCS (LNAI), vol. 4811, pp. 36–55. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  34. Wale, N., Karypis, G.: AFGEN. Technical report, Department of Computer Science & Enigneering, University of Minnesota (2007), http://www.cs.umn.edu/~karypis

  35. Wang, P., Domeniconi, C., Laskey, K.: Latent Dirichlet Bayesian co-clustering. In: Proceedings of the European Conference on Machine Learning, pp. 522–537 (2009)

    Google Scholar 

  36. Xu, Z., Tresp, V., Yu, K., Kriegel, H.: Infinite hidden relational models. In: Proceedings of the International Conference on Uncertainity in Artificial Intelligence (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, P., Domeniconi, C., Rangwala, H., Laskey, K.B. (2012). Feature Enriched Nonparametric Bayesian Co-clustering. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7301. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30217-6_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30217-6_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30216-9

  • Online ISBN: 978-3-642-30217-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics