Abstract
We consider the problem of finding communities in large linked networks such as web structures or citation networks. We review similarity measures for linked objects and discuss the k-Means and EM algorithms, based on text similarity, bibliographic coupling, and co-citation strength. We study the utilization of the principle of multi-view learning to combine these similarity measures. We explore the clustering algorithms experimentally using web pages and the Cite-Seer repository of research papers and find that multi-view clustering effectively combines link-based and intrinsic similarity.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
ALBERICH, R., MIRO-JULIA, J., & ROSSELLÓ, F. (2002): Marvel universe looks almost like a real social network (Preprint). arXiv id 0202174.
BICKEL, S., & SCHEFFER, T. (2004): Multi-view clustering. IEEE International Conference on Data Mining.
DASGUPTA, S., LITTMAN, M.L., & McALLESTER, D. (2002): Pac generalization bounds for co-training. Advances in Neural Information Processing Systems 14 (pp. 375–382). Cambridge, MA: MIT Press.
DEMPSTER, A., LAIRD, N., & RUBIN, D. (1977): Maximum likelihood from incomplete data via the EM algorithm. Journ. of Royal Stat. Soc. B, 39.
GARFIELD, E. (1972): Citation analysis as a tool in journal evaluation. Science, 178, 471–479.
GETOOR, L. (2003): Link mining: A new data mining challenge. SIGKDD Exploration 5.
GIBSON, D., KLEINBERG, J.M., & RAGHAVAN, P. (1998): Inferring web communities from link topology. UK Conference on Hypertext (pp. 225–234).
HE, X., DING, C.H.Q., ZHA, H., & SIMON, H.D. (2001): Automatic topic identification using webpage clustering. ICDM (pp. 195–202).
HOPCROFT, J., KHAN, O., & SELMAN, B. (2003): Tracking evolving communities in large linked networks. Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining.
KAUTZ, H., SELMAN, B., & SHAH, M. (1997): The hidden web. AI Magazine, 18, 27–36.
LILJEROS, F., EDLING, C., AMARAL, L., STANLEY, H., & ABERG, Y. (2001): The web of human sexual contacts. Nature, 411, 907–908.
LU, Q., & GETOOR, L. (2003): Link-based text classification. IJCAI Workshop on Text Mining and Link Analysis, Acapulco, MX.
MODHA, D.S., & Spangler, W.S. (2000): Clustering hypertext with applications to web searching. ACM Conference on Hypertext (pp. 143–152).
REDNER, S. (1998): How popular is your paper? an empirical study of the citation distribution. European Physical Journal B, 4, 131–134.
WANG, Y., & KITSUREGAWA, M. (2001): Link based clustering of Web search results. Lecture Notes in Computer Science, 2118.
WATTS, D., & STROGATZ, S. (1998): Collective dynamics of small-world networks. Nature, 393, 440–442.
WHITE, H. (2003): Pathfinder networks and author cocitation analysis: a remapping of paradigmatic information scientists. Journal of the American Society for Information Science and Technology, 54, 423–434.
WHITE, H., & McCAIN, K. (1989): Bibliometrics. Annual Review of Information Science and Technology, 24, 119–186.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer Berlin · Heidelberg
About this paper
Cite this paper
Drost, I., Bickel, S., Scheffer, T. (2006). Discovering Communities in Linked Data by Multi-view Clustering. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_41
Download citation
DOI: https://doi.org/10.1007/3-540-31314-1_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31313-7
Online ISBN: 978-3-540-31314-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)