Dimensional Clustering of Linked Data: Techniques and Applications

Ferrara, Alfio; Genta, Lorenzo; Montanelli, Stefano; Castano, Silvana

doi:10.1007/978-3-662-46562-2_3

Alfio Ferrara²²,
Lorenzo Genta²²,
Stefano Montanelli²² &
…
Silvana Castano²²

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 8990))

759 Accesses
6 Citations

Abstract

The plurality and heterogeneity of linked data features require appropriate solutions for accurate matching and clustering. In this paper, we propose a dimensional clustering approach to enforce (i) the capability to select the set of features to use for data matching and clustering, that are packaged into the so-called thematic dimension, and (ii) the capability to make explicit the cause of similarity that generates each cluster. Ensemble techniques for combining different single-dimension cluster sets into a sort of multi-dimensional view of the considered linked data are also presented as a further contribution of the paper. Application to linked data summarization and exploration is finally discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For the sake of readability, only a subset of the available properties is reported (http://www.dbpedia.org).
2.
More technical details about the construction of linked data items from the RDF statements of a repository \(\mathcal {R}\) are provided in [5].
3.
Since \({\text {ldi-match}}^{\mathcal {D}}(ldi_i, ldi_j) = {\text {ldi-match}}^{\mathcal {D}}(ldi_j, ldi_i)\), we define \(\sigma M\) and \(\pi M\) as upper triangular matrices.
4.
A detailed presentation of summarization techniques is out of the scope of this work. Here, we outline how to generate a summary-view over a cluster set \(CL\). For the interested reader, a more technical presentation of cluster essential definition, proximity-link specification, and prominence value calculation is provided in [5].

References

Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 12(4), 461–486 (2009)
Article Google Scholar
Bae, E., Bailey, J.: COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), Hong Kong, China, pp. 53–62 (2006)
Google Scholar
Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data. Springer, Heidelberg (2006)
Google Scholar
Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)
Article Google Scholar
Castano, S., Ferrara, A., Montanelli, S.: Thematic clustering and exploration of linked data. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 7538, pp. 157–175. Springer, Heidelberg (2012)
Chapter Google Scholar
Drost, I., Bickel, S., Scheffer, T.: Discovering communities in linked data by multi-view clustering. In: Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation, Magdeburg, Germany, pp. 342–349 (2005)
Google Scholar
Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Int. J. Semant. Web Inf. Syst. 7(3), 46–76 (2011)
Article Google Scholar
Ferrara, A., Genta, L., Montanelli, S.: Linked data classification: a feature-based approach. In: Proceedings of the 3rd EDBT International Workshop on Linked Web Data Management (LWDM 2013), Genova, Italy (2013)
Google Scholar
Giannakidou, E., Vakali, A.: Integrating web 2.0 data into linked open data cloud via clustering. In: Proceedings of the Workshop on Linked Data in the Future Internet at the Future Internet Assembly, Ghent, Belgium (2010)
Google Scholar
Goldberg, M.K., Hayvanovych, M., Magdon-Ismail, M.: Measuring similarity between sets of overlapping clusters. In: Proceedings of the IEEE SocialCom/PASSAT Conference, Minneapolis, Minnesota, USA, pp. 303–308 (2010)
Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)
Article MATH Google Scholar
Jean-Mary, Y.R., Shironoshita, E.P., Kabuka, M.R.: Ontology matching with semantic verification. J. Web Semant. 7(3), 235–251 (2009)
Article Google Scholar
Kailing, K., Kriegel, H.-P., Pryakhin, A., Schubert, M.: Clustering multi-represented objects with noise. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 394–403. Springer, Heidelberg (2004)
Chapter Google Scholar
Lu, Q., Conrad, J.G., Al-Kofahi, K., Keenan, W.: Legal document clustering with built-in topic segmentation. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM 2011), Glasgow, UK (2011)
Google Scholar
Minaei-Bidgoli, B., Topchy, A.P., Punch, W.F.: A comparison of resampling methods for clustering ensembles. In: Proceedings of the International Conference on Artificial Intelligence (IC-AI 2004), Las Vegas, Nevada, USA, pp. 939–945 (2004)
Google Scholar
Müller, E., Günnemann, S., Färber, I., Seidl, T.: Discovering multiple clustering solutions: grouping objects in different views of the data. In: Proceedings of the 28th IEEE International Conference on Data Engineering (ICDE 2012), Washington, DC, USA, pp. 1207–1210 (2012)
Google Scholar
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Article Google Scholar
Newman, M.J.: A measure of betweenness centrality based on random walks. Soc. Netw. 27(1), 39–54 (2005)
Article Google Scholar
Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML 2009), Montreal, Quebec, Canada (2009)
Google Scholar
Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: Proceedings of the 6th ACM SIGKDD KDD-2000 Workshop on Text Mining, Boston, MA, USA (2000)
Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
MathSciNet Google Scholar
Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. Int. J. Pattern Recogn. Artif. Intell. 25(3), 337–372 (2011)
Article MathSciNet Google Scholar
Verykios, V.S., Elmagarmid, A.K., Houstis, E.N.: Automating the approximate record-matching process. Inf. Sci. 126(1–4), 83–98 (2000)
Article MATH Google Scholar
Wang, Z., Li, J., Zhao, Y., Setchi, R., Tang, J.: A unified approach to matching semantic data on the web. Knowl. Based Syst. 39, 173–184 (2013)
Article Google Scholar
Xu, R., Wunsch II, D.C.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Article Google Scholar
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn. 55(3), 311–331 (2004)
Article MATH Google Scholar
Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Università degli Studi di Milano, DI -Via Comelico, 39, 20135, Milano, Italy
Alfio Ferrara, Lorenzo Genta, Stefano Montanelli & Silvana Castano

Authors

Alfio Ferrara
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Genta
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Montanelli
View author publications
You can also search for this author in PubMed Google Scholar
Silvana Castano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefano Montanelli .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Josef Küng
FAW, University of Linz, Linz, Austria
Roland Wagner
University of Brescia, Brescia, Italy
Devis Bianchini
University of Brescia, Brescia, Italy
Valeria De Antonellis
University of Rome III, Rome, Italy
Roberto De Virgilio

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ferrara, A., Genta, L., Montanelli, S., Castano, S. (2015). Dimensional Clustering of Linked Data: Techniques and Applications. In: Hameurlain, A., Küng, J., Wagner, R., Bianchini, D., De Antonellis, V., De Virgilio, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX. Lecture Notes in Computer Science(), vol 8990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46562-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-662-46562-2_3
Published: 24 February 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46561-5
Online ISBN: 978-3-662-46562-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics