Feature Enriched Nonparametric Bayesian Co-clustering

Wang, Pu; Domeniconi, Carlotta; Rangwala, Huzefa; Laskey, Kathryn B.

doi:10.1007/978-3-642-30217-6_43

Pu Wang²³,
Carlotta Domeniconi²³,
Huzefa Rangwala²³ &
…
Kathryn B. Laskey²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7301))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2906 Accesses
2 Citations

Abstract

Co-clustering has emerged as an important technique for mining relational data, especially when data are sparse and high-dimensional. Co-clustering simultaneously groups the different kinds of objects involved in a relation. Most co-clustering techniques typically only leverage the entries of the given contingency matrix to perform the two-way clustering. As a consequence, they cannot predict the interaction values for new objects. In many applications, though, additional features associated to the objects of interest are available. The Infinite Hidden Relational Model (IHRM) has been proposed to make use of these features. As such, IHRM has the capability to forecast relationships among previously unseen data. The work on IHRM lacks an evaluation of the improvement that can be achieved when leveraging features to make predictions for unseen objects. In this work, we fill this gap and re-interpret IHRM from a co-clustering point of view. We focus on the empirical evaluation of forecasting relationships between previously unseen objects by leveraging object features. The empirical evaluation demonstrates the effectiveness of the feature-enriched approach and identifies the conditions under which the use of features is most useful, i.e., with sparse data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agarwal, D., Chen, B.-C.: Regression-based latent factor models. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, pp. 19–28 (2009)
Google Scholar
Antoniak, C.E.: Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems. The Annals of Statistics 2(6), 1152–1174 (1974)
Article MathSciNet MATH Google Scholar
Balabanovic, M., Shoham, Y.: Fab: content-based, collaborative recommendation. Commun. ACM 40(3), 66–72 (1997)
Article Google Scholar
Blackwell, D., Macqueen, J.B.: Ferguson distributions via Pólya urn schemes. The Annals of Statistics 1, 353–355 (1973)
Article MathSciNet MATH Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3(4-5), 993–1022 (2003)
MATH Google Scholar
Chen, Y.-H., George, E.I.: A bayesian model for collaborative filtering. In: 7th International Workshop on Artificial Intelligence and Statistics (1999)
Google Scholar
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, pp. 89–98 (2003)
Google Scholar
Dunson, D.B., Xue, Y., Carin, L.: The matrix stick-breaking process: Flexible Bayes meta-analysis. Journal of the American Statistical Association 103(481), 317–327 (2008)
Article MathSciNet MATH Google Scholar
Ferguson, T.S.: A Bayesian analysis of some nonparametric problems. The Annals of Statistics 1(2), 209–230 (1973)
Article MathSciNet MATH Google Scholar
George, T., Merugu, S.: A scalable collaborative filtering framework based on co-clustering. In: Proceedings of the IEEE International Conference on Data Mining, pp. 625–628 (2005)
Google Scholar
Hartigan, J.A.: Direct clustering of a data matrix. Journal of the American Statistical Association 67(337), 123–129 (1972)
Google Scholar
Hofmann, T.: Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst. 22, 89–115 (2004)
Article Google Scholar
Jacob, L., Hoffmann, B., Stoven, V., Vert, J.-P.: Virtual screening of GPCRs: an in silico chemogenomics approach. BMC Bioinformatics 9(1), 363 (2008)
Article Google Scholar
Jin, R., Si, L., Zhai, C.: A study of mixture models for collaborative filtering. Journal of Information Retrieval 9, 357–382 (2006)
Article Google Scholar
Khoshneshin, M., Street, W.N.: Incremental collaborative filtering via evolutionary co-clustering. In: Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys 2010, pp. 325–328. ACM, New York (2010)
Chapter Google Scholar
Lemire, D., Maclachlan, A.: Slope one predictors for online rating-based collaborative filtering. In: Proceedings of the SIAM Data Mining, SDM (2005)
Google Scholar
Lim, Y.J., Teh, Y.W.: Variational Bayesian Approach to Movie Rating Prediction. In: Proceedings of KDD Cup and Workshop (2007)
Google Scholar
Marlin, B.: Modeling user rating profiles for collaborative filtering. In: Advances in Neural Information Processing Systems (NIPS), vol. 17 (2003)
Google Scholar
Meeds, E., Roweis, S.: Nonparametric Bayesian Biclustering. Technical Report UTML TR 2007-001, Department of Computer Science, University of Toronto (2007)
Google Scholar
Neal, R.M.: Markov Chain Sampling Methods for Dirichlet Process Mixture Models. Journal of Computational and Graphical Statistics 9(2), 249–265 (2000)
MathSciNet Google Scholar
Ning, X., Rangwala, H., Karypis, G.: Multi-assay-based structure activity relationship models: Improving structure activity relationship models by incorporating activity information from related targets. Journal of Chemical Information and Modeling 49(11), 2444–2456 (2009); PMID: 19842624
Article Google Scholar
Okuno, Y., Yang, J., Taneishi, K., Yabuuchi, H., Tsujimoto, G.: GLIDA: GPCR-ligand database for chemical genomic drug discovery. Nucleic Acids Research 34(suppl. 1), D673–D677 (2006)
Article Google Scholar
Papaspiliopoulos, O., Roberts, G.O.: Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika 95(1), 169–186 (2008)
Article MathSciNet MATH Google Scholar
Pitman, J., Yor, M.: The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability 25(2), 855–900 (1997)
Article MathSciNet MATH Google Scholar
Porteous, I., Asuncion, A., Welling, M.: Bayesian matrix factorization with side information and dirichlet process mixtures. In: AAAI (2010)
Google Scholar
Salakhyuditnov, R., Mnih, A.: Bayesian Probabilistic Matrix Factorization using Markov Chain Monte Carlo. In: International Conference on Machine Learning (2008)
Google Scholar
Schafer, J.B., Konstan, J., Riedi, J.: Recommender systems in e-commerce. In: Proceedings of the ACM Conference on Electronic Commerce, pp. 158–166 (1999)
Google Scholar
Sethuraman, J.: A constructive definition of Dirichlet priors. Statistica Sinica 4, 639–650 (1994)
MathSciNet MATH Google Scholar
Shafiei, M., Milios, E.: Latent Dirichlet co-clustering. In: IEEE International Conference on Data Mining, pp. 542–551 (2006)
Google Scholar
Shan, H., Banerjee, A.: Bayesian co-clustering. In: IEEE International Conference on Data Mining (2008)
Google Scholar
Shan, H., Banerjee, A.: Generalized probabilistic matrix factorizations for collaborative filtering. In: Proceedings of the IEEE International Conference on Data Mining, pp. 1025–1030 (2010)
Google Scholar
Sutskever, I., Salakhutdinov, R., Tenenbaum, J.: Modelling relational data using Bayesian clustered tensor factorization. In: Advances in Neural Information Processing Systems, vol. 22, pp. 1821–1828 (2009)
Google Scholar
Symeonidis, P., Nanopoulos, A., Papadopoulos, A., Manolopoulos, Y.: Nearest-Biclusters Collaborative Filtering with Constant Values. In: Nasraoui, O., Spiliopoulou, M., Srivastava, J., Mobasher, B., Masand, B. (eds.) WebKDD 2006. LNCS (LNAI), vol. 4811, pp. 36–55. Springer, Heidelberg (2007)
Chapter Google Scholar
Wale, N., Karypis, G.: AFGEN. Technical report, Department of Computer Science & Enigneering, University of Minnesota (2007), http://www.cs.umn.edu/~karypis
Wang, P., Domeniconi, C., Laskey, K.: Latent Dirichlet Bayesian co-clustering. In: Proceedings of the European Conference on Machine Learning, pp. 522–537 (2009)
Google Scholar
Xu, Z., Tresp, V., Yu, K., Kriegel, H.: Infinite hidden relational models. In: Proceedings of the International Conference on Uncertainity in Artificial Intelligence (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

George Mason University, 4400 University Ave., Fairfax, VA, 22030, USA
Pu Wang, Carlotta Domeniconi, Huzefa Rangwala & Kathryn B. Laskey

Authors

Pu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Carlotta Domeniconi
View author publications
You can also search for this author in PubMed Google Scholar
Huzefa Rangwala
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn B. Laskey
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Michigan State University, 428 S. Shaw Lane, 48824-1226, East Lansing, MI, USA
Pang-Ning Tan
School of Information Technologies, University of Sydney, 1 Cleveland St., 2006, Sydney, NSW, Australia
Sanjay Chawla
Faculty of Computing and Informatics, Jalan Multimedia, Multimedia University, 63100, Cyberjaya, Selangor, Malaysia
Chin Kuan Ho
Department of Computing and Information Systems, The University of Melbourne, 111 Barry Street, 3053, Melbourne, VIC, Australia
James Bailey

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, P., Domeniconi, C., Rangwala, H., Laskey, K.B. (2012). Feature Enriched Nonparametric Bayesian Co-clustering. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7301. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30217-6_43

Download citation

DOI: https://doi.org/10.1007/978-3-642-30217-6_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30216-9
Online ISBN: 978-3-642-30217-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics