Abstract
We have reviewed several state-of-the-art machine learning approaches to different types of link-based clustering in this chapter. Specifically, we have presented the spectral clustering for heterogeneous relational data, the symmetric convex coding for homogeneous relational data, the citation model for clustering the special but popular homogeneous relational data—the textual documents with citations, the probabilistic clustering framework on mixed membership for general relational data, and the statistical graphical model for dynamic relational clustering. We have demonstrated the effectiveness of these machine learning approaches through empirical evaluations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Each state is represented as a distinct cluster.
- 2.
References
A. Banerjee, I. S. Dhillon, J. Ghosh, S. Merugu, and D. S. Modha. A generalized maximum entropy approach to bregman co-clustering and matrix approximation. In KDD, pages 509–514, 2004.
A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with bregman divergences. Journal of Machine Learning Research, 6:1705–1749, 2005.
S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In Proceedings ACM KDD04, pages 59–68, Seattle, WA, August 2004.
M. J. Beal, Z. Ghahramani, and C. E. Rasmussen. The infinite hidden markov model. In NIPS 14, 2002.
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 993–1022, 2003.
T. N. Bui and C. Jones. A heuristic for reducing fill-in in sparse matrix factorization. In PPSC, pages 445–452, 1993.
M. Catral, L. Han, M. Neumann, and R. J. Plemmons. On reduced rank nonnegative matrix factorization for symmetric nonnegative matrices. Linear Algebra and Its Application, 2004.
P. K. Chan, M. D. F. Schlag, and J. Y. Zien. Spectral k-way ratio-cut partitioning and clustering. In DAC’93, pages 749–754, 1993.
Y. Chi, X. Song, D. Zhou, K. Hino, and B. L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 153–162, 2007.
H. Cho, I. Dhillon, Y. Guan, and S. Sra. Minimum sum squared residue co-clustering of gene expression data. In SDM, 2004.
D. Cohn and H. Chang. Learning to probabilistically identify authoritative documents. In Proceeding of ICML, pages 167–174, 2000.
D. A. Cohn and T. Hofmann. The missing link – a probabilistic model of document content and hypertext connectivity. In Proceedings of NIPS, pages 430–436, 2000.
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401:788–791, 1999.
I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In KDD’03, pages 89–98, 2003.
I. Dhillon, Y. Guan, and B. Kulis. A unified view of kernel k-means, spectral clustering and graph cuts. Technical Report TR-04-25, University of Texas at Austin, 2004.
I. Dhillon, Y. Guan, and B. Kulis. A fast kernel-based multilevel algorithm for graph clustering. In KDD’05, 2005.
I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In KDD, pages 269–274, 2001.
C. Ding, X. He, and H. D. Simon. On the equivalence of nonnegative matrix factorization and spectral clustering. In SDM’05, 2005.
C. H. Q. Ding, X. He, H. Zha, M. Gu, and H. D. Simon. A min-max cut algorithm for graph partitioning and data clustering. In Proceedings of ICDM 2001, pages 107–114, 2001.
E. Erosheva and S. E. Fienberg. Bayesian mixed membership models for soft clustering and classification. Classification-The Ubiquitous Challenge, pages 11–26, 2005.
E.A. Erosheva, S.E. Fienberg, and J. Lafferty. Mixed membership models of scientific publications. In NAS.
M. D. Escobar and M. West. Bayesian density estimation and inference using mixtures. The Annals of Statistics, 90:577–588, 1995.
B. Gao, T. Y. Liu, X. Zheng, Q. S. Cheng, and W. Y. Ma. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In KDD’05, pages 41–50, 2005.
Z. Guo, S. Zhu, Y. Chi, Z. Zhang, and Y. Gong. A latent topic model for linked documents. In Proceedings of ACM SIGIR, 2009.
G. Heinrich. Parameter estimation for text analysis. Technical Report, 2004.
B. Hendrickson and R. Leland. A multilevel algorithm for partitioning graphs. In Supercomputing ’95, page 28, 1995.
M. Henzinger, R. Motwani, and C. Silverstein. Challenges in web search engines. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, pages 1573–1579, 2003.
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings SIGIR, pages 50–57, 1999.
G. Karypis. A clustering toolkit, 2002.
G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing, 20(1):359–392, 1998.
B. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 49(2):291–307, 1970.
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for emerging cyber-communities. Computer Networks, 31(11–16), 1999.
K. Lang. News weeder: Learning to filter netnews. In ICML, 1995.
T. Li. A general model for clustering binary data. In KDD’05, 2005.
B. Long, Z. Zhang, and P. S. Yu. Relational clustering by symmetric convex coding. In Proceedings of International Conference on Machine Learning, 2007.
B. Long, Z. Zhang, X. Wu, and P. S. Yu. Spectral clustering for multi-type relational data. In Proceedings of ICML, 2006.
B. Long, Z. Zhang, and P. S. Yu. A probabilistic framework for relational clustering. In Proceedings of ACM KDD, 2007.
B. Long, X. Wu, Z. Zhang, and P. S. Yu. Unsupervised learning on k-partite graphs. In KDD-2006, 2006.
B. Long, Z. M. Zhang, and P. S. Yu. Co-clustering by block value decomposition. In KDD’05, 2005.
A. McCallum, K. Nigam, J. Rennie, and K. Seymore. Automating the construction of internet portals with machine learning. Information Retrieval, 3(2):127–163, 2000.
A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, 2001.
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis Machine Intelligence, 22(8):888–905, 2000.
A. Strehl and J. Ghosh. Cluster ensembles – a knowledge reuse framework for combining partitionings. In AAAI 2002, pages 93–98, 2002.
Y. Teh, M. Beal M. Jordan, and D. Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2007.
K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained k-means clustering with background knowledge. In ICML-2001, pages 577–584, 2001.
E. P. Xing, A. Y. Ng, M. I. Jorda, and S. Russel. Distance metric learning with applications to clustering with side information. In NIPS’03, volume 16, 2003.
T. Xu, Z. Zhang, P. S. Yu, and B. Long. Evolutionary clustering by hierarchical dirichlet process with hidden markov state. In Proceedings of IEEE ICDM, 2008.
W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In Proceedings of SIGIR, pages 267–273, 2003.
S. Yu and J. Shi. Multiclass spectral clustering. In ICCV’03, 2003.
H. Zha, C. Ding, M. Gu, X. He, and H. Simon. Bi-partite graph partitioning and data clustering. In ACM CIKM’01, 2001.
H. Zha, C. Ding, M. Gu, X. He, and H. Simon. Spectral relaxation for k-means clustering. Advances in Neural Information Processing Systems, 14, 2002.
Acknowledgments
This work is supported in part through NSF grants [IIS-0535162, IIS-0812114, IIS-0905215, and DBI-0960443], as well as graduate research internships at Google Research Labs and NEC Laboratories America, Inc. Yun Chi, Yihong Gong, Xiaoyun Wu, and Shenghuo Zhu have made contributions to part of this material.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Zhang, Z.(., Long, B., Guo, Z., Xu, T., Yu, P.S. (2010). Machine Learning Approaches to Link-Based Clustering. In: Yu, P., Han, J., Faloutsos, C. (eds) Link Mining: Models, Algorithms, and Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6515-8_1
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6515-8_1
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6514-1
Online ISBN: 978-1-4419-6515-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)