Improving Clustering-Based Schema Matching Using Latent Semantic Indexing

Algergawy, Alsayed; Moawed, Seham; Sarhan, Amany; Eldosouky, Ali; Saake, Gunter

doi:10.1007/978-3-662-45761-0_4

Alsayed Algergawy^24,25,
Seham Moawed²⁶,
Amany Sarhan²⁵,
Ali Eldosouky²⁶ &
…
Gunter Saake²⁷

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 8920))

874 Accesses
2 Citations

Abstract

The increasing size and the widespread use of XML data and different types of ontologies result in the big challenge of how to integrate these data. A critical step towards building this integration is to identify and discover semantically corresponding elements across heterogeneous data sets. This identification process becomes more and more challenging when dealing with large schemas and ontologies. Clustering-based matching is a great step towards more significant reduction of the search space and thus improving the matching efficiency. However, current methods used to identify similar clusters depend on literally matching terms. To keep high matching quality along with high matching efficiency, hidden semantic relationships among clusters’ elements should be discovered. To this end, in this paper, we propose a Latent Semantic Indexing-based approach that allows retrieving the conceptual meaning between clusters. The experimental evaluations reveal that the proposed approach permits encouraging and significant improvements towards building large-scale matching approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.w3.org/TR/xquery/.
2.
http://msdn.microsoft.com/en-us/library/ee265410(v=bts.10).aspx.
3.
https://xsom.java.net.
4.
http://www.w3.org/TR/xmlschema-2/.
5.
XML Schema - Data Types Quick Reference, http://www.xml.dvint.com/.
6.
http://queens.db.toronto.edu/project/clio/index.php#testschemas.

References

Abiteboul, S., Suciu, D., Buneman, P.: Data on the Web: From Relations to Semistructed Data and XML. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Algergawy, A., Massmann, S., Rahm, E.: A clustering-based approach for large-scale ontology matching. In: Eder, J., Bielikova, M., Tjoa, A.M. (eds.) ADBIS 2011. LNCS, vol. 6909, pp. 415–428. Springer, Heidelberg (2011)
Chapter Google Scholar
Algergawy, A., Nayak, R., Saake, G.: Element similarity measures in XML schema matching. Inf. Sci. 180(24), 4975–4998 (2010)
Article Google Scholar
Algergawy, A., Nayak, R., Siegmund, N., Köppen, V., Saake, G.: Combining schema and level-based matching for web service discovery. In: Benatallah, B., Casati, F., Kappel, G., Rossi, G. (eds.) ICWE 2010. LNCS, vol. 6189, pp. 114–128. Springer, Heidelberg (2010)
Chapter Google Scholar
Algergawy, A., Schallehn, E., Saake, G.: Improving XML schema matching using Prüfer sequences. DKE 68(8), 728–747 (2009)
Article Google Scholar
Aslan, G., McLeod, D.: Semantic heterogeneity resolution in federated databases by metadata implantation and stepwise evolution. VLDB J. 8(2), 120–132 (1999)
Article Google Scholar
Bellahsene, Z., Bonifati, A., Rahm. E.: Schema Matching and Mapping. Springer, Heidelberg (2011).
Google Scholar
Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces, and information retrieval. SIAM Rev. 41(2), 335–362 (1999)
Article MathSciNet MATH Google Scholar
Bonifati, A., Mecca, G., Pappalardo, A., Raunich, S., Summa, G.: Schema mapping verification: the spicy way. In: EDBT 2008, France, pp. 85–96 (2008)
Google Scholar
Chiticariu, L., Hernández, M.A., Kolaitis, P.G., Popa, L.: Semi-automatic schema integration in Clio. In: VLDB’07, pp. 1326–1329 (2007)
Google Scholar
Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: IIWeb, pp. 73–78 (2003)
Google Scholar
Deerwester, S., Dumais, S.T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)
Article Google Scholar
Do, H.H., Melnik, S., Rahm, E.: Comparison of schema matching evaluations. In: The 2nd International Workshop on Web Databases (2002)
Google Scholar
Do, H.H., Rahm, E.: Matching large schemas: approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)
Article Google Scholar
Doan, A., Halevy, A.: Semantic integration research in the database community: a brief survey. AAAI AI Mag. 25(1), 83–94 (2005)
Google Scholar
Doan, A., Halevy, A.Y., Ives, Z.G.: Principles of Data Integration. Morgan Kaufmann, San Francisco (2012)
Google Scholar
Ehrig, M., Staab, S.: QOM – quick ontology mapping. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 683–697. Springer, Heidelberg (2004)
Chapter Google Scholar
Halevy, A.Y., Ives, Z.G., Suciu, D., Tatarinov, I.: Schema mediation in peer data management systems. In: 19th International Conference on Data Engineering, pp. 505–516 (2003)
Google Scholar
Hamdi, F., Safar, B., Reynaud, C., Zargayouna, H.: Alignment-based partitioning of large-scale ontologies. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds.) Advances in Knowledge Discovery and Management. SCI, vol. 292, pp. 251–269. Springer, Heidelberg (2010)
Chapter Google Scholar
Hao, Y., Zhang, Y.: Web services discovery based on schema matching. In: ACSC 2007, pp. 107–113 (2007)
Google Scholar
Hu, W., Qu, Y., Cheng, G.: Matching large ontologies: a divide-and-conquer approach. DKE 67, 140–160 (2008)
Article Google Scholar
Landauer, T.: Handbook of Latent Semantic Analysis. Lawrence Erlbaum, Mahwah (2007)
Google Scholar
Lee, D., Chu, W.W.: Comparative analysis of six XML schema languages. SIGMOD Rec. 9(3), 76–87 (2000)
Article Google Scholar
Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: Xclust: clustering XML schemas for effective integration. In: CIKM’02, pp. 63–74 (2002)
Google Scholar
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Moawed, S., Algergawy, A., Sarhan, A., Eldosouky, A., Saake, G.: A latent semantic indexing-based approach to determine similar clusters in large-scale schema matching. In: Catania, B., et al. (eds.) New Trends in Databases and Information Systems. AISC, vol. 241, pp. 267–276. Springer, Heidelberg (2014)
Chapter Google Scholar
Peukert, E., Berthold, H., Rahm, E.: Rewrite techniques for performance optimization of schema matching processes. In: EDBT, pp. 453–464 (2010)
Google Scholar
Peukert, E., Eberius, J., Rahm, E.: A self-configuring schema matching system. In: 28th International Conference on Data Engineering (ICDE), 2012, pp. 306–317 (2012)
Google Scholar
Peukert, E., Massmann, S., Konig, K.: Comparing similarity combination methods for schema matching. In: GI-Workshop, pp. 692–701 (2010)
Google Scholar
Rahm, E.: Towards large-scale schema and ontology matching. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds.) Schema Matching and Mapping. Data-Centric Systems and Applications, pp. 3–27. Springer, Heidelberg (2011)
Chapter Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Article MATH Google Scholar
Seddiquia, M.H., Aono, M.: An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size. Web Semant. 7(4), 344–356 (2009)
Article Google Scholar
Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25(1), 158–176 (2013)
Article Google Scholar
Thuy, P.: Hybrid similarity measure for XML data integration and transformation. Ph.D. thesis, Seoul, Korea (2012)
Google Scholar
Wang, Z., Wang, Y., Zhang, S.-S., Shen, G., Du, T.: Matching large scale ontology effectively. In: Mizoguchi, R., Shi, Z.-Z., Giunchiglia, F. (eds.) ASWC 2006. LNCS, vol. 4185, pp. 99–105. Springer, Heidelberg (2006)
Chapter Google Scholar
Zhong, Q., Li, H., Li, J., Xie, G.T., Tang, J., Zhou, L., Pan, Y.: A Gauss function based approach for unbalanced ontology matching. In: ACM SIGMOD International Conference on Management of Data, (SIGMOD 2009), pp. 669–680 (2009)
Google Scholar

Download references

Acknowledgments

This paper is a revised and extended version of the paper presented in [26]. A. Algergawy partially worked on this paper while at Magdeburg University.

Author information

Authors and Affiliations

Institute of Computer Science, Friedrich Schiller University of Jena, Jena, Germany
Alsayed Algergawy
Department of Computer Engineering, Tanta University, Tanta, Egypt
Alsayed Algergawy & Amany Sarhan
Department of Computer Engineering, Mansoura University, Mansoura, Egypt
Seham Moawed & Ali Eldosouky
Department of Computer Science, University of Magdeburg, Magdeburg, Germany
Gunter Saake

Authors

Alsayed Algergawy
View author publications
You can also search for this author in PubMed Google Scholar
Seham Moawed
View author publications
You can also search for this author in PubMed Google Scholar
Amany Sarhan
View author publications
You can also search for this author in PubMed Google Scholar
Ali Eldosouky
View author publications
You can also search for this author in PubMed Google Scholar
Gunter Saake
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alsayed Algergawy .

Editor information

Editors and Affiliations

IRIT, Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
FAW, University of Linz, Linz, Austria
Josef Küng
FAW, University of Linz, Linz, Austria
Roland Wagner
University of Genoa, Genoa, Italy
Barbara Catania
University of Genoa, Genoa, Italy
Giovanna Guerrini
LIPADE, Paris Descartes University, Paris, France
Themis Palpanas
Charles University, Prague, Czech Republic
Jaroslav Pokorný
Aristotle University of Thessalonik, Thessaloniki, Greece
Athena Vakali

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Algergawy, A., Moawed, S., Sarhan, A., Eldosouky, A., Saake, G. (2014). Improving Clustering-Based Schema Matching Using Latent Semantic Indexing. In: Hameurlain, A., et al. Transactions on Large-Scale Data- and Knowledge-Centered Systems XV. Lecture Notes in Computer Science(), vol 8920. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45761-0_4

Download citation

DOI: https://doi.org/10.1007/978-3-662-45761-0_4
Published: 12 December 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45760-3
Online ISBN: 978-3-662-45761-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics