Abstract
Schema matching is the task of matching between concepts describing the meaning of data in various heterogeneous, distributed data sources. With many heuristics to choose from, several tools have enabled the use of schema matcher ensembles, combining principles by which different schema matchers judge the similarity between concepts. In this work, we investigate means of estimating the uncertainty involved in schema matching and harnessing it to improve an ensemble outcome. We propose a model for schema matching, based on simple probabilistic principles. We then propose the use of machine learning in determining the best mapping and discuss its pros and cons. Finally, we provide a thorough empirical analysis, using both real-world and synthetic data, to test the proposed technique. We conclude that the proposed heuristic performs well, given an accurate modeling of uncertainty in matcher decision making.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alagic, S., Bernstein, P.: A model theory for generic schema management. In: Ghelli, G., Grahne, G. (eds.) DBPL 2001. LNCS, vol. 2397, pp. 228–246. Springer, Heidelberg (2002)
Benerecetti, M., Bouquet, P., Zanobini, S.: Soundness of schema matching methods. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 211–225. Springer, Heidelberg (2005)
Berlin, J., Motro, A.: Autoplex: Automated discovery of content for virtual databases. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, pp. 108–122. Springer, Heidelberg (2001)
Cudré-Mauroux, P., et al.: Viewpoints on emergent semantics. Journal on Data Semantics 6, 1–27 (2006)
Dhamankar, R., Lee, Y., Doan, A., Halevy, A., Domingos, P.: imap: Discovering complex mappings between database schemas. In: Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), pp. 383–394. ACM Press, New York (2004)
Do, H., Rahm, E.: COMA - a system for flexible combination of schema matching approaches. In: Proceedings of the International conference on Very Large Data Bases (VLDB), pp. 610–621 (2002)
Doan, A., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: A machine-learning approach. In: Aref, W.G. (ed.) Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), Santa Barbara, California, pp. 509–520. ACM Press, New York (2001)
Domshlak, C., Gal, A., Roitman, H.: Rank aggregation for automatic schema matching. IEEE Transactions on Knowledge and Data Engineering (TKDE) 19(4), 538–553 (2007)
Ehrig, M., Staab, S., Sure, Y.: Bootstrapping ontology alignment methods with apfel. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 186–200. Springer, Heidelberg (2005)
Euzenat, J., Mochol, M., Svab, O., Svatek, V., Shvaiko, P., Stuckenschmidt, H., van Hage, W., Yatskevich, M.: Introduction to the ontology alignment evaluation 2006. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, Springer, Heidelberg (2006)
Gal, A.: Managing uncertainty in schema matching with top-k schema mappings. Journal of Data Semantics 6, 90–114 (2006)
Gal, A., Anaby-Tavor, A., Trombetta, A., Montesi, D.: A framework for modeling and evaluating automatic semantic reconciliation. VLDB Journal 14(1), 50–67 (2005)
Gal, A., Modica, G., Jamil, H., Eyal, A.: Automatic ontology matching using application semantics. AI Magazine 26(1), 21–32 (2005)
He, B., Chang, K.-C.: Making holistic schema matching robust: an ensemble approach. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, August 21-24, 2005, pp. 429–438 (2005)
Hull, R.: Managing semantic heterogeneity in databases: A theoretical perspective. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pp. 51–61. ACM Press, New York (1997)
Lee, Y., Sayyadian, M., Doan, A., Rosenthal, A.: eTuner: tuning schema matching software using synthetic scenarios. VLDB Journal 16(1), 97–122 (2007)
Li, W.-S., Clifton, C.: SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data & Knowledge Engineering 33(1), 49–84 (2000)
Madhavan, J., Bernstein, P., Domingos, P., Halevy, A.: Representing and reasoning about mappings between domain models. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI), pp. 80–86 (2002)
Madhavan, J., Bernstein, P., Rahm, E.: Generic schema matching with Cupid. In: Proceedings of the International conference on Very Large Data Bases (VLDB), Rome, Italy, pp. 49–58 (September 2001)
Marcoulides, G., Hershberger, S.: Multivariate Statistical Methods. Lawrence Erlbaum Associates, Mahwah (1997)
Melnik, S.: Generic Model Management: Concepts and Algorithms. Springer, Heidelberg (2004)
Miller, R., Haas, L., Hernández, M.: Schema mapping as query discovery. In: Abbadi, A.E., Brodie, M., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., Whang, K.-Y. (eds.) Proceedings of the International conference on Very Large Data Bases (VLDB), pp. 77–88. Morgan Kaufmann, San Francisco (2000)
Nottelmann, H., Straccia, U.: Information retrieval and machine learning for probabilistic schema matching. Information Processing and Management 43(3), 552–576 (2007)
Ross, S.: A First Course in Probability, 5th edn. Prentice-Hall, Englewood Cliffs (1997)
Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal of Data Semantics 4, 146–171 (2005)
Srivastava, B., Koehler, J.: Web service composition - Current solutions and open problems. In: Workshop on Planning for Web Services (ICAPS-2003), Trento, Italy (2003)
Su, W., Wang, J., Lochovsky, F.: Aholistic schema matching for web query interfaces. In: Grust, T., Höpfner, H., Illarramendi, A., Jablonski, S., Mesiti, M., Müller, S., Patranjan, P.-L., Sattler, K.-U., Spiliopoulou, M., Wijsen, J. (eds.) EDBT 2006. LNCS, vol. 4254, pp. 77–94. Springer, Heidelberg (2006)
Xu, L., Embley, D.: A composite approach to automating direct and indirect schema mappings. Information Systems 31(8), 697–886 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Marie, A., Gal, A. (2007). Managing Uncertainty in Schema Matcher Ensembles. In: Prade, H., Subrahmanian, V.S. (eds) Scalable Uncertainty Management. SUM 2007. Lecture Notes in Computer Science(), vol 4772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75410-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-75410-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75407-7
Online ISBN: 978-3-540-75410-7
eBook Packages: Computer ScienceComputer Science (R0)