Advertisement

YAM: A Step Forward for Generating a Dedicated Schema Matcher

  • Fabien DuchateauEmail author
  • Zohra Bellahsene
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9620)

Abstract

Discovering correspondences between schema elements is a crucial task for data integration. Most schema matching tools are semi-automatic, e.g., an expert must tune certain parameters (thresholds, weights, etc.). They mainly use aggregation methods to combine similarity measures. The tuning of a matcher, especially for its aggregation function, has a strong impact on the matching quality of the resulting correspondences, and makes it difficult to integrate a new similarity measure or to match specific domain schemas. In this paper, we present YAM (Yet Another Matcher), a matcher factory which enables the generation of a dedicated schema matcher for a given schema matching scenario. For this purpose we have formulated the schema matching task as a classification problem. Based on this machine learning framework, YAM automatically selects and tunes the best method to combine similarity measures (e.g., a decision tree, an aggregation function). In addition, we describe how user inputs, such as a preference between recall or precision, can be closely integrated during the generation of the dedicated matcher. Many experiments run against matchers generated by YAM and traditional matching tools confirm the benefits of a matcher factory and the significant impact of user preferences.

Keywords

Schema matching Data integration Matcher factory Schema matcher Machine learning Classification 

References

  1. 1.
    Altschul, S.F., Erickson, B.W.: Optimal sequence alignment using affine gap costs. Bull. Math. Biol. 48(5–6), 603–616 (1986)CrossRefMathSciNetzbMATHGoogle Scholar
  2. 2.
    Aumueller, D., Do, H.-H., Massmann, S., Rahm, E.: Schema and ontology matching with coma++. In: SIGMOD, pp. 906–908 (2005)Google Scholar
  3. 3.
    Bellahsene, Z., Bonifati, A., Rahm, E. (eds.): Schema Matching and Mapping. Springer, Heidelberg (2011)zbMATHGoogle Scholar
  4. 4.
    Berlin, J., Motro, A.: Autoplex: automated discovery of content for virtual databases. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, pp. 108–122. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  5. 5.
    Berlin, J., Motro, A.: Database schema matching using machine learning with feature selection. In: Pidduck, A.B., Mylopoulos, J., Woo, C.C., Ozsu, M.T. (eds.) CAiSE 2002. LNCS, vol. 2348, p. 452. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Bernstein, P.A., Madhavan, J., Rahm, E.: Generic schema matching, ten years later. PVLDB 4(11), 695–701 (2011)Google Scholar
  7. 7.
    Chapelle, O., Schölkopf, B., Zien, A. (eds.): Semi-supervised Learning. MIT Press, Cambridge (2006)Google Scholar
  8. 8.
    Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of the IJCAI 2003 (2003)Google Scholar
  9. 9.
    Cruz, I.F., Antonelli, F.P., Stroe, C.: AgreementMaker: efficient matching for large real-world schemas and ontologies. PVLDB 2(2), 1586–1589 (2009)Google Scholar
  10. 10.
    Djeddi, W.E., Khadir, M.T.: Ontology alignment using artificial neural network for large-scale ontologies. Int. J. Metadata Semant. Ontol. 8(1), 75–92 (2013)CrossRefGoogle Scholar
  11. 11.
    Do, H.H., Rahm, E.: Coma - a system for flexible combination of schema matching approaches. In: VLDB, pp. 610–621 (2002)Google Scholar
  12. 12.
    Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: a machine-learning approach. In: SIGMOD, pp. 509–520 (2001)Google Scholar
  13. 13.
    Doan, A.H., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.Y.: Learning to match ontologies on the semantic web. VLDB J. 12(4), 303–319 (2003)CrossRefGoogle Scholar
  14. 14.
    Doan, A., Madhavan, J., Domingos, P., Halevy, A.: Ontology matching: a machine learning approach. In: Staab, S., Studer, R. (eds.) Handbook on Ontologies in Information Systems, pp. 397–416. Springer, Heidelberg (2004)Google Scholar
  15. 15.
    Dougherty, J., Kohavi, R., Sahami, M., et al.: Supervised and unsupervised discretization of continuous features. In: Proceedings of 12th International Conference on Machine Learning, vol. 12, 194–202 (1995)Google Scholar
  16. 16.
    Dragut, E., Lawrence, R.: Composing mappings between schemas using a reference ontology. In: Meersman, R. (ed.) OTM 2004. LNCS, vol. 3290, pp. 783–800. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  17. 17.
    Duchateau, F., Bellahsene, Z.: Designing a benchmark for the assessmentof schema matching tools. Open J. Databases (OJDB) 1, 3–25 (2014). RonPub, GermanyGoogle Scholar
  18. 18.
    Duchateau, F., Bellahsene, Z., Coletta, R.: A flexible approach for planning schema matching algorithms. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part I. LNCS, vol. 5331, pp. 249–264. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  19. 19.
    Duchateau, F., Bellahsene, Z., Roche, M.: A context-based measure for discovering approximate semantic matching between schema elements. In: Research Challenges in Information Science (RCIS) (2007)Google Scholar
  20. 20.
    Euzenat, J., Shvaiko, P.: Ontology Matching. Springer, Heidelberg (2007)zbMATHGoogle Scholar
  21. 21.
    Fayyad, U.M., Irani, K.B.: On the handling of continuous-valued attributes in decision tree generation. Mach. Learn. 8(1), 87–102 (1992)zbMATHGoogle Scholar
  22. 22.
    Gal, A.: Uncertain Schema Matching. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, San Rafael (2011)CrossRefzbMATHGoogle Scholar
  23. 23.
    Garner, S.R.: Weka: the waikato environment for knowledge analysis. In: Proceedings of the New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)Google Scholar
  24. 24.
    Hammer, J., Stonebraker, M., Topsakal, O.: Thalia: test harness for the assessment of legacy information integration approaches. In: ICDE, pp. 485–486 (2005)Google Scholar
  25. 25.
    Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.: Information retrieval by semantic similarity. Int. J. Seman. Web Inf. Syst. 2(3), 55–73 (2006)CrossRefGoogle Scholar
  26. 26.
    Köpcke, H., Rahm, E.: Training selection for tuning entity matching. In: QDB/MUD, pp. 3–12 (2008)Google Scholar
  27. 27.
    Lee, Y., Sayyadian, M., Doan, A.H., Rosenthal, A.: eTuner: tuning schema matching software using synthetic scenarios. VLDB J. 16(1), 97–122 (2007)CrossRefGoogle Scholar
  28. 28.
    Li, J., Tang, J., Li, Y., Luo, Q.: Rimom: a dynamic multistrategy ontology alignment framework. IEEE Trans. Knowl. Data Eng. 21(8), 1218–1232 (2009)CrossRefGoogle Scholar
  29. 29.
    Lin, D.: An information-theoretic definition of similarity. In: ICML 1998, pp. 296–304 (1998)Google Scholar
  30. 30.
    Malgorzata, M., Anja, J., Jérôme, E.: Applying an analytic method for matching approach selection. In: CEUR Workshop Proceedings of Ontology Matching, vol. 225. CEUR-WS.org (2006)Google Scholar
  31. 31.
    Marie, A., Gal, A.: Boosting schema matchers. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part I. LNCS, vol. 5331, pp. 283–300. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  32. 32.
    Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: aversatile graph matching algorithm and its application to schema matching. In: Proceedings of ICDE, pp. 117–128 (2002)Google Scholar
  33. 33.
    Melnik, S., Rahm, E., Bernstein, P.A.: Developing metadata-intensive applications with Rondo. J. Web Seman. I, 47–74 (2003)CrossRefGoogle Scholar
  34. 34.
    Mitchell, T.: Machine Learning. McGraw-Hill Education, New York (1997). (ISE Editions)zbMATHGoogle Scholar
  35. 35.
    Mork, P., Seligman, L., Rosenthal, A., Korb, J., Wolf, C.: The harmony integration workbench. J. Data Seman. 11, 65–93 (2008)Google Scholar
  36. 36.
    Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)CrossRefGoogle Scholar
  37. 37.
    University of Illinois: The UIUC web integration repository (2003). http://metaquerier.cs.uiuc.edu/repository
  38. 38.
    Paulheim, H., Hertling, S., Ritze, D.: Towards evaluating interactive ontology matching tools. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 31–45. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  39. 39.
    Peukert, E., Eberius, J., Rahm, E.: Rule-based construction of matching processes. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, New York, pp. 2421–2424. ACM (2011)Google Scholar
  40. 40.
    Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 11, 95–130 (1999)zbMATHGoogle Scholar
  41. 41.
  42. 42.
    Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  43. 43.
    Shvaiko, P., Euzenat, J.: Ten challenges for ontology matching. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part II. LNCS, vol. 5332, pp. 1164–1182. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  44. 44.
    Smith, K., Morse, M., Mork, P., Li, M., Rosenthal, A., Allen, D., Seligman, L.: The role of schema matching in large enterprises. In: CIDR (2009)Google Scholar
  45. 45.
    Winkler, W.E.: String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. In: Proceedings of the Section on Survey Research, pp. 354–359 (1990)Google Scholar
  46. 46.
    Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Philip, Y.S., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)CrossRefGoogle Scholar
  47. 47.
    Xu, L., Embley, D.W.: Using domain ontologies to discover direct and indirect matches for schema elements, pp. 97–103 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Université Lyon 1, LIRIS UMR 5205LyonFrance
  2. 2.Université Montpellier, LIRMMMontpellierFrance

Personalised recommendations