Skip to main content

Discovering Communities in Linked Data by Multi-view Clustering

  • Conference paper

Abstract

We consider the problem of finding communities in large linked networks such as web structures or citation networks. We review similarity measures for linked objects and discuss the k-Means and EM algorithms, based on text similarity, bibliographic coupling, and co-citation strength. We study the utilization of the principle of multi-view learning to combine these similarity measures. We explore the clustering algorithms experimentally using web pages and the Cite-Seer repository of research papers and find that multi-view clustering effectively combines link-based and intrinsic similarity.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   159.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • ALBERICH, R., MIRO-JULIA, J., & ROSSELLÓ, F. (2002): Marvel universe looks almost like a real social network (Preprint). arXiv id 0202174.

    Google Scholar 

  • BICKEL, S., & SCHEFFER, T. (2004): Multi-view clustering. IEEE International Conference on Data Mining.

    Google Scholar 

  • DASGUPTA, S., LITTMAN, M.L., & McALLESTER, D. (2002): Pac generalization bounds for co-training. Advances in Neural Information Processing Systems 14 (pp. 375–382). Cambridge, MA: MIT Press.

    Google Scholar 

  • DEMPSTER, A., LAIRD, N., & RUBIN, D. (1977): Maximum likelihood from incomplete data via the EM algorithm. Journ. of Royal Stat. Soc. B, 39.

    Google Scholar 

  • GARFIELD, E. (1972): Citation analysis as a tool in journal evaluation. Science, 178, 471–479.

    Google Scholar 

  • GETOOR, L. (2003): Link mining: A new data mining challenge. SIGKDD Exploration 5.

    Google Scholar 

  • GIBSON, D., KLEINBERG, J.M., & RAGHAVAN, P. (1998): Inferring web communities from link topology. UK Conference on Hypertext (pp. 225–234).

    Google Scholar 

  • HE, X., DING, C.H.Q., ZHA, H., & SIMON, H.D. (2001): Automatic topic identification using webpage clustering. ICDM (pp. 195–202).

    Google Scholar 

  • HOPCROFT, J., KHAN, O., & SELMAN, B. (2003): Tracking evolving communities in large linked networks. Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining.

    Google Scholar 

  • KAUTZ, H., SELMAN, B., & SHAH, M. (1997): The hidden web. AI Magazine, 18, 27–36.

    Google Scholar 

  • LILJEROS, F., EDLING, C., AMARAL, L., STANLEY, H., & ABERG, Y. (2001): The web of human sexual contacts. Nature, 411, 907–908.

    Article  Google Scholar 

  • LU, Q., & GETOOR, L. (2003): Link-based text classification. IJCAI Workshop on Text Mining and Link Analysis, Acapulco, MX.

    Google Scholar 

  • MODHA, D.S., & Spangler, W.S. (2000): Clustering hypertext with applications to web searching. ACM Conference on Hypertext (pp. 143–152).

    Google Scholar 

  • REDNER, S. (1998): How popular is your paper? an empirical study of the citation distribution. European Physical Journal B, 4, 131–134.

    Google Scholar 

  • WANG, Y., & KITSUREGAWA, M. (2001): Link based clustering of Web search results. Lecture Notes in Computer Science, 2118.

    Google Scholar 

  • WATTS, D., & STROGATZ, S. (1998): Collective dynamics of small-world networks. Nature, 393, 440–442.

    Article  Google Scholar 

  • WHITE, H. (2003): Pathfinder networks and author cocitation analysis: a remapping of paradigmatic information scientists. Journal of the American Society for Information Science and Technology, 54, 423–434.

    Google Scholar 

  • WHITE, H., & McCAIN, K. (1989): Bibliometrics. Annual Review of Information Science and Technology, 24, 119–186.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer Berlin · Heidelberg

About this paper

Cite this paper

Drost, I., Bickel, S., Scheffer, T. (2006). Discovering Communities in Linked Data by Multi-view Clustering. In: Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A., Gaul, W. (eds) From Data and Information Analysis to Knowledge Engineering. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31314-1_41

Download citation

Publish with us

Policies and ethics