Abstract
The original Semantic Web vision was explicit in the need for intelligent autonomous agents that would represent users and help them navigate the Semantic Web. We argue that an essential feature for such agents is the capability to analyse data and learn. In this paper we outline the challenges and issues surrounding the application of clustering algorithms to Semantic Web data. We present several ways to extract instances from a large RDF graph and computing the distance between these. We evaluate our approaches on three different data-sets, one representing a typical relational database to RDF conversion, one based on data from a ontologically rich Semantic Web enabled application, and one consisting of a crawl of FOAF documents; applying both supervised and unsupervised evaluation metrics. Our evaluation did not support choosing a single combination of instance extraction method and similarity metric as superior in all cases, and as expected the behaviour depends greatly on the data being clustered. Instead, we attempt to identify characteristics of data that make particular methods more suitable.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284, 28–37 (2001)
Grimnes, G.A., Edwards, P., Preece, A.: Learning Meta-Descriptions of the FOAF Network. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 152–165. Springer, Heidelberg (2004)
Edwards, P., Grimnes, G.A., Preece, A.: An Empirical Investigation of Learning from the Semantic Web. In: ECML/PKDD, Semantic Web Mining Workshop, pp. 71–89 (2002)
Johnson, S.C.: Hierarchical clustering schemes. Psychometrika 2, 241–254 (1967)
Sauermann, L., Grimnes, G.A., Kiesel, M., Fluit, C., Maus, H., Heim, D., Nadeem, D., Horak, B., Dengel, A.: Semantic desktop 2.0: The gnowsis experience. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, Springer, Heidelberg (2006)
Montes-y-Gómez, M., Gelbukh, A., López-López, A.: Comparison of Conceptual Graphs. In: Cairó, O., Cantú, F.J. (eds.) MICAI 2000. LNCS, vol. 1793, pp. 548–556. Springer, Heidelberg (2000)
Dieng, R., Hug, S.: Comparison of personal ontologies represented through conceptual graphs. In: Proceedings of ECAI 1998, pp. 341–345 (1998)
Euzenat, J., Valtchev, P.: An integrative proximity measure for ontology alignment. In: Proceedings of the 1st Intl. Workshop on Semantic Integration. CEUR, vol. 82 (2003)
Maedche, A., Zacharias, V.: Clustering ontology-based metadata in the semantic web. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 348–360. Springer, Heidelberg (2002)
Strehl, A.: Relationship-based Clustering and Cluster Ensembles for High-dimensional Data Mining. PhD thesis, The University of Texas at Austin (2002)
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: KDD 1999: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 16–22. ACM Press, New York (1999)
Heß, A.: Supervised and Unsupervised Ensemble Learning for the Semantic Web. PhD thesis, School of Computer Science and Informatics, University College Dublin, Dublin, Ireland (2006)
Zamir, O., Etzioni, O., Madani, O., Karp, R.M.: Fast and intuitive clustering of web documents. In: KDD, pp. 287–290 (1997)
Sugar, C.A., James, G.M.: Finding the Number of Clusters in a Data Set - An Information Theoretic Approach. Journal of the American Statistical Association 98, 750–763 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grimnes, G.A., Edwards, P., Preece, A. (2008). Instance Based Clustering of Semantic Web Resources. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds) The Semantic Web: Research and Applications. ESWC 2008. Lecture Notes in Computer Science, vol 5021. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68234-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-68234-9_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68233-2
Online ISBN: 978-3-540-68234-9
eBook Packages: Computer ScienceComputer Science (R0)