Improving Clustering-Based Schema Matching Using Latent Semantic Indexing

  • Alsayed AlgergawyEmail author
  • Seham Moawed
  • Amany Sarhan
  • Ali Eldosouky
  • Gunter Saake
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8920)


The increasing size and the widespread use of XML data and different types of ontologies result in the big challenge of how to integrate these data. A critical step towards building this integration is to identify and discover semantically corresponding elements across heterogeneous data sets. This identification process becomes more and more challenging when dealing with large schemas and ontologies. Clustering-based matching is a great step towards more significant reduction of the search space and thus improving the matching efficiency. However, current methods used to identify similar clusters depend on literally matching terms. To keep high matching quality along with high matching efficiency, hidden semantic relationships among clusters’ elements should be discovered. To this end, in this paper, we propose a Latent Semantic Indexing-based approach that allows retrieving the conceptual meaning between clusters. The experimental evaluations reveal that the proposed approach permits encouraging and significant improvements towards building large-scale matching approaches.


Schema matching Large-scale matching Latent semantic indexing Partitioning-based matching Hierarchical clustering method Vector Space Model (VSM) Document similarity 



This paper is a revised and extended version of the paper presented in [26]. A. Algergawy partially worked on this paper while at Magdeburg University.


  1. 1.
    Abiteboul, S., Suciu, D., Buneman, P.: Data on the Web: From Relations to Semistructed Data and XML. Morgan Kaufmann, San Francisco (2000)Google Scholar
  2. 2.
    Algergawy, A., Massmann, S., Rahm, E.: A clustering-based approach for large-scale ontology matching. In: Eder, J., Bielikova, M., Tjoa, A.M. (eds.) ADBIS 2011. LNCS, vol. 6909, pp. 415–428. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  3. 3.
    Algergawy, A., Nayak, R., Saake, G.: Element similarity measures in XML schema matching. Inf. Sci. 180(24), 4975–4998 (2010)CrossRefGoogle Scholar
  4. 4.
    Algergawy, A., Nayak, R., Siegmund, N., Köppen, V., Saake, G.: Combining schema and level-based matching for web service discovery. In: Benatallah, B., Casati, F., Kappel, G., Rossi, G. (eds.) ICWE 2010. LNCS, vol. 6189, pp. 114–128. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Algergawy, A., Schallehn, E., Saake, G.: Improving XML schema matching using Prüfer sequences. DKE 68(8), 728–747 (2009)CrossRefGoogle Scholar
  6. 6.
    Aslan, G., McLeod, D.: Semantic heterogeneity resolution in federated databases by metadata implantation and stepwise evolution. VLDB J. 8(2), 120–132 (1999)CrossRefGoogle Scholar
  7. 7.
    Bellahsene, Z., Bonifati, A., Rahm. E.: Schema Matching and Mapping. Springer, Heidelberg (2011).Google Scholar
  8. 8.
    Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces, and information retrieval. SIAM Rev. 41(2), 335–362 (1999)CrossRefMathSciNetzbMATHGoogle Scholar
  9. 9.
    Bonifati, A., Mecca, G., Pappalardo, A., Raunich, S., Summa, G.: Schema mapping verification: the spicy way. In: EDBT 2008, France, pp. 85–96 (2008)Google Scholar
  10. 10.
    Chiticariu, L., Hernández, M.A., Kolaitis, P.G., Popa, L.: Semi-automatic schema integration in Clio. In: VLDB’07, pp. 1326–1329 (2007)Google Scholar
  11. 11.
    Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: IIWeb, pp. 73–78 (2003)Google Scholar
  12. 12.
    Deerwester, S., Dumais, S.T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)CrossRefGoogle Scholar
  13. 13.
    Do, H.H., Melnik, S., Rahm, E.: Comparison of schema matching evaluations. In: The 2nd International Workshop on Web Databases (2002)Google Scholar
  14. 14.
    Do, H.H., Rahm, E.: Matching large schemas: approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)CrossRefGoogle Scholar
  15. 15.
    Doan, A., Halevy, A.: Semantic integration research in the database community: a brief survey. AAAI AI Mag. 25(1), 83–94 (2005)Google Scholar
  16. 16.
    Doan, A., Halevy, A.Y., Ives, Z.G.: Principles of Data Integration. Morgan Kaufmann, San Francisco (2012)Google Scholar
  17. 17.
    Ehrig, M., Staab, S.: QOM – quick ontology mapping. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 683–697. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  18. 18.
    Halevy, A.Y., Ives, Z.G., Suciu, D., Tatarinov, I.: Schema mediation in peer data management systems. In: 19th International Conference on Data Engineering, pp. 505–516 (2003)Google Scholar
  19. 19.
    Hamdi, F., Safar, B., Reynaud, C., Zargayouna, H.: Alignment-based partitioning of large-scale ontologies. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds.) Advances in Knowledge Discovery and Management. SCI, vol. 292, pp. 251–269. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  20. 20.
    Hao, Y., Zhang, Y.: Web services discovery based on schema matching. In: ACSC 2007, pp. 107–113 (2007)Google Scholar
  21. 21.
    Hu, W., Qu, Y., Cheng, G.: Matching large ontologies: a divide-and-conquer approach. DKE 67, 140–160 (2008)CrossRefGoogle Scholar
  22. 22.
    Landauer, T.: Handbook of Latent Semantic Analysis. Lawrence Erlbaum, Mahwah (2007)Google Scholar
  23. 23.
    Lee, D., Chu, W.W.: Comparative analysis of six XML schema languages. SIGMOD Rec. 9(3), 76–87 (2000)CrossRefGoogle Scholar
  24. 24.
    Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: Xclust: clustering XML schemas for effective integration. In: CIKM’02, pp. 63–74 (2002)Google Scholar
  25. 25.
    Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefzbMATHGoogle Scholar
  26. 26.
    Moawed, S., Algergawy, A., Sarhan, A., Eldosouky, A., Saake, G.: A latent semantic indexing-based approach to determine similar clusters in large-scale schema matching. In: Catania, B., et al. (eds.) New Trends in Databases and Information Systems. AISC, vol. 241, pp. 267–276. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  27. 27.
    Peukert, E., Berthold, H., Rahm, E.: Rewrite techniques for performance optimization of schema matching processes. In: EDBT, pp. 453–464 (2010)Google Scholar
  28. 28.
    Peukert, E., Eberius, J., Rahm, E.: A self-configuring schema matching system. In: 28th International Conference on Data Engineering (ICDE), 2012, pp. 306–317 (2012)Google Scholar
  29. 29.
    Peukert, E., Massmann, S., Konig, K.: Comparing similarity combination methods for schema matching. In: GI-Workshop, pp. 692–701 (2010)Google Scholar
  30. 30.
    Rahm, E.: Towards large-scale schema and ontology matching. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds.) Schema Matching and Mapping. Data-Centric Systems and Applications, pp. 3–27. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  31. 31.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)CrossRefzbMATHGoogle Scholar
  32. 32.
    Seddiquia, M.H., Aono, M.: An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size. Web Semant. 7(4), 344–356 (2009)CrossRefGoogle Scholar
  33. 33.
    Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25(1), 158–176 (2013)CrossRefGoogle Scholar
  34. 34.
    Thuy, P.: Hybrid similarity measure for XML data integration and transformation. Ph.D. thesis, Seoul, Korea (2012)Google Scholar
  35. 35.
    Wang, Z., Wang, Y., Zhang, S.-S., Shen, G., Du, T.: Matching large scale ontology effectively. In: Mizoguchi, R., Shi, Z.-Z., Giunchiglia, F. (eds.) ASWC 2006. LNCS, vol. 4185, pp. 99–105. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  36. 36.
    Zhong, Q., Li, H., Li, J., Xie, G.T., Tang, J., Zhou, L., Pan, Y.: A Gauss function based approach for unbalanced ontology matching. In: ACM SIGMOD International Conference on Management of Data, (SIGMOD 2009), pp. 669–680 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Alsayed Algergawy
    • 1
    • 2
    Email author
  • Seham Moawed
    • 3
  • Amany Sarhan
    • 2
  • Ali Eldosouky
    • 3
  • Gunter Saake
    • 4
  1. 1.Institute of Computer ScienceFriedrich Schiller University of JenaJenaGermany
  2. 2.Department of Computer EngineeringTanta UniversityTantaEgypt
  3. 3.Department of Computer EngineeringMansoura UniversityMansouraEgypt
  4. 4.Department of Computer ScienceUniversity of MagdeburgMagdeburgGermany

Personalised recommendations