Skip to main content

Dimensional Clustering of Linked Data: Techniques and Applications

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 8990))

Abstract

The plurality and heterogeneity of linked data features require appropriate solutions for accurate matching and clustering. In this paper, we propose a dimensional clustering approach to enforce (i) the capability to select the set of features to use for data matching and clustering, that are packaged into the so-called thematic dimension, and (ii) the capability to make explicit the cause of similarity that generates each cluster. Ensemble techniques for combining different single-dimension cluster sets into a sort of multi-dimensional view of the considered linked data are also presented as a further contribution of the paper. Application to linked data summarization and exploration is finally discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For the sake of readability, only a subset of the available properties is reported (http://www.dbpedia.org).

  2. 2.

    More technical details about the construction of linked data items from the RDF statements of a repository \(\mathcal {R}\) are provided in [5].

  3. 3.

    Since \({\text {ldi-match}}^{\mathcal {D}}(ldi_i, ldi_j) = {\text {ldi-match}}^{\mathcal {D}}(ldi_j, ldi_i)\), we define \(\sigma M\) and \(\pi M\) as upper triangular matrices.

  4. 4.

    A detailed presentation of summarization techniques is out of the scope of this work. Here, we outline how to generate a summary-view over a cluster set \(CL\). For the interested reader, a more technical presentation of cluster essential definition, proximity-link specification, and prominence value calculation is provided in [5].

References

  1. Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retr. 12(4), 461–486 (2009)

    Article  Google Scholar 

  2. Bae, E., Bailey, J.: COALA: a novel approach for the extraction of an alternate clustering of high quality and high dissimilarity. In: Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), Hong Kong, China, pp. 53–62 (2006)

    Google Scholar 

  3. Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data. Springer, Heidelberg (2006)

    Google Scholar 

  4. Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)

    Article  Google Scholar 

  5. Castano, S., Ferrara, A., Montanelli, S.: Thematic clustering and exploration of linked data. In: Ceri, S., Brambilla, M. (eds.) Search Computing. LNCS, vol. 7538, pp. 157–175. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  6. Drost, I., Bickel, S., Scheffer, T.: Discovering communities in linked data by multi-view clustering. In: Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation, Magdeburg, Germany, pp. 342–349 (2005)

    Google Scholar 

  7. Ferrara, A., Nikolov, A., Scharffe, F.: Data linking for the semantic web. Int. J. Semant. Web Inf. Syst. 7(3), 46–76 (2011)

    Article  Google Scholar 

  8. Ferrara, A., Genta, L., Montanelli, S.: Linked data classification: a feature-based approach. In: Proceedings of the 3rd EDBT International Workshop on Linked Web Data Management (LWDM 2013), Genova, Italy (2013)

    Google Scholar 

  9. Giannakidou, E., Vakali, A.: Integrating web 2.0 data into linked open data cloud via clustering. In: Proceedings of the Workshop on Linked Data in the Future Internet at the Future Internet Assembly, Ghent, Belgium (2010)

    Google Scholar 

  10. Goldberg, M.K., Hayvanovych, M., Magdon-Ismail, M.: Measuring similarity between sets of overlapping clusters. In: Proceedings of the IEEE SocialCom/PASSAT Conference, Minneapolis, Minnesota, USA, pp. 303–308 (2010)

    Google Scholar 

  11. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001)

    Article  MATH  Google Scholar 

  12. Jean-Mary, Y.R., Shironoshita, E.P., Kabuka, M.R.: Ontology matching with semantic verification. J. Web Semant. 7(3), 235–251 (2009)

    Article  Google Scholar 

  13. Kailing, K., Kriegel, H.-P., Pryakhin, A., Schubert, M.: Clustering multi-represented objects with noise. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 394–403. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  14. Lu, Q., Conrad, J.G., Al-Kofahi, K., Keenan, W.: Legal document clustering with built-in topic segmentation. In: Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM 2011), Glasgow, UK (2011)

    Google Scholar 

  15. Minaei-Bidgoli, B., Topchy, A.P., Punch, W.F.: A comparison of resampling methods for clustering ensembles. In: Proceedings of the International Conference on Artificial Intelligence (IC-AI 2004), Las Vegas, Nevada, USA, pp. 939–945 (2004)

    Google Scholar 

  16. Müller, E., Günnemann, S., Färber, I., Seidl, T.: Discovering multiple clustering solutions: grouping objects in different views of the data. In: Proceedings of the 28th IEEE International Conference on Data Engineering (ICDE 2012), Washington, DC, USA, pp. 1207–1210 (2012)

    Google Scholar 

  17. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

    Article  Google Scholar 

  18. Newman, M.J.: A measure of betweenness centrality based on random walks. Soc. Netw. 27(1), 39–54 (2005)

    Article  Google Scholar 

  19. Nguyen, X.V., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning (ICML 2009), Montreal, Quebec, Canada (2009)

    Google Scholar 

  20. Steinbach, M., Karypis, G., Kumar, V., et al.: A comparison of document clustering techniques. In: Proceedings of the 6th ACM SIGKDD KDD-2000 Workshop on Text Mining, Boston, MA, USA (2000)

    Google Scholar 

  21. Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)

    MathSciNet  Google Scholar 

  22. Vega-Pons, S., Ruiz-Shulcloper, J.: A survey of clustering ensemble algorithms. Int. J. Pattern Recogn. Artif. Intell. 25(3), 337–372 (2011)

    Article  MathSciNet  Google Scholar 

  23. Verykios, V.S., Elmagarmid, A.K., Houstis, E.N.: Automating the approximate record-matching process. Inf. Sci. 126(1–4), 83–98 (2000)

    Article  MATH  Google Scholar 

  24. Wang, Z., Li, J., Zhao, Y., Setchi, R., Tang, J.: A unified approach to matching semantic data on the web. Knowl. Based Syst. 39, 173–184 (2013)

    Article  Google Scholar 

  25. Xu, R., Wunsch II, D.C.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)

    Article  Google Scholar 

  26. Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn. 55(3), 311–331 (2004)

    Article  MATH  Google Scholar 

  27. Zhou, Y., Cheng, H., Yu, J.X.: Graph clustering based on structural/attribute similarities. Proc. VLDB Endow. 2(1), 718–729 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefano Montanelli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ferrara, A., Genta, L., Montanelli, S., Castano, S. (2015). Dimensional Clustering of Linked Data: Techniques and Applications. In: Hameurlain, A., Küng, J., Wagner, R., Bianchini, D., De Antonellis, V., De Virgilio, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XIX. Lecture Notes in Computer Science(), vol 8990. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46562-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46562-2_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46561-5

  • Online ISBN: 978-3-662-46562-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics