Skip to main content

Improving Clustering-Based Schema Matching Using Latent Semantic Indexing

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems XV

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 8920))

Abstract

The increasing size and the widespread use of XML data and different types of ontologies result in the big challenge of how to integrate these data. A critical step towards building this integration is to identify and discover semantically corresponding elements across heterogeneous data sets. This identification process becomes more and more challenging when dealing with large schemas and ontologies. Clustering-based matching is a great step towards more significant reduction of the search space and thus improving the matching efficiency. However, current methods used to identify similar clusters depend on literally matching terms. To keep high matching quality along with high matching efficiency, hidden semantic relationships among clusters’ elements should be discovered. To this end, in this paper, we propose a Latent Semantic Indexing-based approach that allows retrieving the conceptual meaning between clusters. The experimental evaluations reveal that the proposed approach permits encouraging and significant improvements towards building large-scale matching approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.w3.org/TR/xquery/.

  2. 2.

    http://msdn.microsoft.com/en-us/library/ee265410(v=bts.10).aspx.

  3. 3.

    https://xsom.java.net.

  4. 4.

    http://www.w3.org/TR/xmlschema-2/.

  5. 5.

    XML Schema - Data Types Quick Reference, http://www.xml.dvint.com/.

  6. 6.

    http://queens.db.toronto.edu/project/clio/index.php#testschemas.

References

  1. Abiteboul, S., Suciu, D., Buneman, P.: Data on the Web: From Relations to Semistructed Data and XML. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  2. Algergawy, A., Massmann, S., Rahm, E.: A clustering-based approach for large-scale ontology matching. In: Eder, J., Bielikova, M., Tjoa, A.M. (eds.) ADBIS 2011. LNCS, vol. 6909, pp. 415–428. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  3. Algergawy, A., Nayak, R., Saake, G.: Element similarity measures in XML schema matching. Inf. Sci. 180(24), 4975–4998 (2010)

    Article  Google Scholar 

  4. Algergawy, A., Nayak, R., Siegmund, N., Köppen, V., Saake, G.: Combining schema and level-based matching for web service discovery. In: Benatallah, B., Casati, F., Kappel, G., Rossi, G. (eds.) ICWE 2010. LNCS, vol. 6189, pp. 114–128. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Algergawy, A., Schallehn, E., Saake, G.: Improving XML schema matching using Prüfer sequences. DKE 68(8), 728–747 (2009)

    Article  Google Scholar 

  6. Aslan, G., McLeod, D.: Semantic heterogeneity resolution in federated databases by metadata implantation and stepwise evolution. VLDB J. 8(2), 120–132 (1999)

    Article  Google Scholar 

  7. Bellahsene, Z., Bonifati, A., Rahm. E.: Schema Matching and Mapping. Springer, Heidelberg (2011).

    Google Scholar 

  8. Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces, and information retrieval. SIAM Rev. 41(2), 335–362 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bonifati, A., Mecca, G., Pappalardo, A., Raunich, S., Summa, G.: Schema mapping verification: the spicy way. In: EDBT 2008, France, pp. 85–96 (2008)

    Google Scholar 

  10. Chiticariu, L., Hernández, M.A., Kolaitis, P.G., Popa, L.: Semi-automatic schema integration in Clio. In: VLDB’07, pp. 1326–1329 (2007)

    Google Scholar 

  11. Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: IIWeb, pp. 73–78 (2003)

    Google Scholar 

  12. Deerwester, S., Dumais, S.T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)

    Article  Google Scholar 

  13. Do, H.H., Melnik, S., Rahm, E.: Comparison of schema matching evaluations. In: The 2nd International Workshop on Web Databases (2002)

    Google Scholar 

  14. Do, H.H., Rahm, E.: Matching large schemas: approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)

    Article  Google Scholar 

  15. Doan, A., Halevy, A.: Semantic integration research in the database community: a brief survey. AAAI AI Mag. 25(1), 83–94 (2005)

    Google Scholar 

  16. Doan, A., Halevy, A.Y., Ives, Z.G.: Principles of Data Integration. Morgan Kaufmann, San Francisco (2012)

    Google Scholar 

  17. Ehrig, M., Staab, S.: QOM – quick ontology mapping. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 683–697. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  18. Halevy, A.Y., Ives, Z.G., Suciu, D., Tatarinov, I.: Schema mediation in peer data management systems. In: 19th International Conference on Data Engineering, pp. 505–516 (2003)

    Google Scholar 

  19. Hamdi, F., Safar, B., Reynaud, C., Zargayouna, H.: Alignment-based partitioning of large-scale ontologies. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds.) Advances in Knowledge Discovery and Management. SCI, vol. 292, pp. 251–269. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  20. Hao, Y., Zhang, Y.: Web services discovery based on schema matching. In: ACSC 2007, pp. 107–113 (2007)

    Google Scholar 

  21. Hu, W., Qu, Y., Cheng, G.: Matching large ontologies: a divide-and-conquer approach. DKE 67, 140–160 (2008)

    Article  Google Scholar 

  22. Landauer, T.: Handbook of Latent Semantic Analysis. Lawrence Erlbaum, Mahwah (2007)

    Google Scholar 

  23. Lee, D., Chu, W.W.: Comparative analysis of six XML schema languages. SIGMOD Rec. 9(3), 76–87 (2000)

    Article  Google Scholar 

  24. Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: Xclust: clustering XML schemas for effective integration. In: CIKM’02, pp. 63–74 (2002)

    Google Scholar 

  25. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  26. Moawed, S., Algergawy, A., Sarhan, A., Eldosouky, A., Saake, G.: A latent semantic indexing-based approach to determine similar clusters in large-scale schema matching. In: Catania, B., et al. (eds.) New Trends in Databases and Information Systems. AISC, vol. 241, pp. 267–276. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  27. Peukert, E., Berthold, H., Rahm, E.: Rewrite techniques for performance optimization of schema matching processes. In: EDBT, pp. 453–464 (2010)

    Google Scholar 

  28. Peukert, E., Eberius, J., Rahm, E.: A self-configuring schema matching system. In: 28th International Conference on Data Engineering (ICDE), 2012, pp. 306–317 (2012)

    Google Scholar 

  29. Peukert, E., Massmann, S., Konig, K.: Comparing similarity combination methods for schema matching. In: GI-Workshop, pp. 692–701 (2010)

    Google Scholar 

  30. Rahm, E.: Towards large-scale schema and ontology matching. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds.) Schema Matching and Mapping. Data-Centric Systems and Applications, pp. 3–27. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  31. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  32. Seddiquia, M.H., Aono, M.: An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size. Web Semant. 7(4), 344–356 (2009)

    Article  Google Scholar 

  33. Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25(1), 158–176 (2013)

    Article  Google Scholar 

  34. Thuy, P.: Hybrid similarity measure for XML data integration and transformation. Ph.D. thesis, Seoul, Korea (2012)

    Google Scholar 

  35. Wang, Z., Wang, Y., Zhang, S.-S., Shen, G., Du, T.: Matching large scale ontology effectively. In: Mizoguchi, R., Shi, Z.-Z., Giunchiglia, F. (eds.) ASWC 2006. LNCS, vol. 4185, pp. 99–105. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  36. Zhong, Q., Li, H., Li, J., Xie, G.T., Tang, J., Zhou, L., Pan, Y.: A Gauss function based approach for unbalanced ontology matching. In: ACM SIGMOD International Conference on Management of Data, (SIGMOD 2009), pp. 669–680 (2009)

    Google Scholar 

Download references

Acknowledgments

This paper is a revised and extended version of the paper presented in [26]. A. Algergawy partially worked on this paper while at Magdeburg University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alsayed Algergawy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Algergawy, A., Moawed, S., Sarhan, A., Eldosouky, A., Saake, G. (2014). Improving Clustering-Based Schema Matching Using Latent Semantic Indexing. In: Hameurlain, A., et al. Transactions on Large-Scale Data- and Knowledge-Centered Systems XV. Lecture Notes in Computer Science(), vol 8920. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45761-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45761-0_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45760-3

  • Online ISBN: 978-3-662-45761-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics