Abstract
Most of the schema matching tools are assembled from multiple match algorithms, each employing a particular technique to improve matching accuracy and making matching systems extensible and customizable to a particular domain. The solutions provided by current schema matching tools consist in aggregating the results obtained by several match algorithms to improve the quality of the discovered matches. However, aggregation entails several drawbacks. Recently, it has been pointed out that the main issue is how to select the most suitable match algorithms to execute for a given domain and how to adjust the multiple knobs (e.g. threshold, performance, quality, etc.). In this article, we present a novel method for selecting the most appropriate schema matching algorithms. The matching engine makes use of a decision tree to combine the most appropriate match algorithms. As a first consequence of using the decision tree, the performance of the system is improved since the complexity is bounded by the height of the decision tree. Thus, only a subset of these match algorithms is used during the matching process. The second advantage is the improvement of the quality of matches. Indeed, for a given domain, only the most suitable match algorithms are used. The experiments show the effectiveness of our approach w.r.t. other matching tools.
Supported by ANR Research Grant ANR-05-MMSA-0007.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
The UIUC web integration repository. Computer Science Department, University of Illinois at Urbana-Champaign (2003), http://metaquerier.cs.uiuc.edu/repository
Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with coma++. In: SIGMOD Conference, Demo paper, pp. 906–908 (2005)
Avesani, P., Giunchiglia, F., Yatskevich, M.: A large scale taxonomy mapping evaluation. In: International Semantic Web Conference, pp. 67–81 (2005)
Batini, C., Lenzerini, M., Navathe, S.B.: A comparitive analysis of methodologies for database schema integration. ACM Computing Surveys 18(4), 323–364 (1986)
Berlin, J., Motro, A.: Automated discovery of contents for virtual databases. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, pp. 108–122. Springer, Heidelberg (2001)
Berlin, J., Motro, A.: Database schema matching using machine learning with feature selection. In: CAiSE (2002)
Bozovic, N., Vassalos, V.: Two-phase schema matching in real world relational databases. In: Data Engineering Workshop, ICDE
Do, H.H., Melnik, S., Rahm, E.: Comparison of schema matching evaluations. In: IWWD (2003)
Do, H.H., Rahm, E.: Coma - a system for flexible combination of schema matching approaches. In: VLDB, pp. 610–621 (2002)
Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: a machine-learning approach. In: SIGMOD, pp. 509–520 (2001)
Doan, A., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.Y.: Learning to match ontologies on the semantic web. VLDB J. 12(4), 303–319 (2003)
Doan, A., Madhavan, J., Domingos, P., Halevy, A.: Ontology matching: A machine learning approach. In: Handbook on Ontologies, International Handbooks on IS (2004)
Drumm, C., Schmitt, M., Do, H.H., Rahm, E.: Quickmig: automatic schema matching for data migration projects. In: CIKM, pp. 107–116. ACM, New York (2007)
Duchateau, F., Bellahsène, Z., Hunt, E.: Xbenchmatch: a benchmark for xml schema matching tools. In: VLDB Proceedings, pp. 1318–1321. VLDB Endowment (2007)
Duchateau, F., Bellahsene, Z., Roche, M.: A context-based measure for discovering approximate semantic matching between schema elements. In: RCIS (2007)
Ehrig, M., Staab, S., Sure, Y.: Bootstrapping ontology alignment methods with apfel. In: ISWC (2005)
Embley, D.W., Xu, L., Ding, Y.: Automatic direct and indirect schema mapping: Experiences and lessons learned. SIGMOD Record journal 33(4), 14–19 (2004)
Euzenat, J., Shvaiko, P.: Ontology matching. Springer, Heidelberg (2007)
Gal, A.: The generation y of xml schema matching (panel description). In: XSym, pp. 137–139 (2007)
Giunchiglia, F., Shvaiko, P., Yatskevich, M.: S-match: an algorithm and an implementation of semantic matching. In: Bussler, C.J., Davies, J., Fensel, D., Studer, R. (eds.) ESWS 2004. LNCS, vol. 3053. Springer, Heidelberg (2004)
Hammer, J., Stonebraker, M., Topsakal, O.: Thalia: Test harness for the assessment of legacy information integration approaches. In: Proceedings of ICDE, pp. 485–486 (2005)
Lee, Y., Sayyadian, M., Doan, A., Rosenthal, A.: Etuner: tuning schema matching software using synthetic scenarios. VLDB J. 16(1), 97–122 (2007)
Li, C., Clifton, C.: Semantic integration in hetrogeneous databases using neural networks. In: VLDB (1994)
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: VLDB, pp. 49–58 (2001)
Marie, A., Gal, A.: Managing uncertainty in schema matcher ensembles. In: Prade, H., Subrahmanian, V.S. (eds.) SUM 2007. LNCS (LNAI), vol. 4772, pp. 60–73. Springer, Heidelberg (2007)
Melnik, S., Molina, H.G., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: Data Engineering, pp. 117–128 (2002)
Meo, P.D., Quattrone, G., Terracina, G., Ursino, D.: Integration of xml schemas at various severity levels. Information Systems, 397–434 (2006)
Milo, T., Zohar, S.: Using schema matching to simplify heterogeneous data translation. In: VLDB, pp. 122–133 (1998)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1987)
Quinlan, J.R.: Improved use of continuous attributes in c4.5. Journal of Artificial Intelligence Research 4, 77–90 (1996)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Secondstring, http://secondstring.sourceforge.net/
Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. J. Data Semantics IV, 146–171 (2005)
Spaccapietra, S., Parent, C., Dupont, Y.: Model independent assertions for integration of hetrogeneous schemas. In: VLDB, pp. 81–126 (1992)
Wordnet (2007), http://wordnet.princeton.edu
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Duchateau, F., Bellahsene, Z., Coletta, R. (2008). A Flexible Approach for Planning Schema Matching Algorithms. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems: OTM 2008. OTM 2008. Lecture Notes in Computer Science, vol 5331. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88871-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-88871-0_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88870-3
Online ISBN: 978-3-540-88871-0
eBook Packages: Computer ScienceComputer Science (R0)