Managing Uncertainty in Schema Matcher Ensembles

Marie, Anan; Gal, Avigdor

doi:10.1007/978-3-540-75410-7_5

Managing Uncertainty in Schema Matcher Ensembles

Anan Marie¹ &
Avigdor Gal¹

Conference paper

375 Accesses
17 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4772))

Abstract

Schema matching is the task of matching between concepts describing the meaning of data in various heterogeneous, distributed data sources. With many heuristics to choose from, several tools have enabled the use of schema matcher ensembles, combining principles by which different schema matchers judge the similarity between concepts. In this work, we investigate means of estimating the uncertainty involved in schema matching and harnessing it to improve an ensemble outcome. We propose a model for schema matching, based on simple probabilistic principles. We then propose the use of machine learning in determining the best mapping and discuss its pros and cons. Finally, we provide a thorough empirical analysis, using both real-world and synthetic data, to test the proposed technique. We conclude that the proposed heuristic performs well, given an accurate modeling of uncertainty in matcher decision making.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alagic, S., Bernstein, P.: A model theory for generic schema management. In: Ghelli, G., Grahne, G. (eds.) DBPL 2001. LNCS, vol. 2397, pp. 228–246. Springer, Heidelberg (2002)
Chapter Google Scholar
Benerecetti, M., Bouquet, P., Zanobini, S.: Soundness of schema matching methods. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 211–225. Springer, Heidelberg (2005)
Google Scholar
Berlin, J., Motro, A.: Autoplex: Automated discovery of content for virtual databases. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, pp. 108–122. Springer, Heidelberg (2001)
Chapter Google Scholar
Cudré-Mauroux, P., et al.: Viewpoints on emergent semantics. Journal on Data Semantics 6, 1–27 (2006)
Article Google Scholar
Dhamankar, R., Lee, Y., Doan, A., Halevy, A., Domingos, P.: imap: Discovering complex mappings between database schemas. In: Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), pp. 383–394. ACM Press, New York (2004)
Chapter Google Scholar
Do, H., Rahm, E.: COMA - a system for flexible combination of schema matching approaches. In: Proceedings of the International conference on Very Large Data Bases (VLDB), pp. 610–621 (2002)
Google Scholar
Doan, A., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: A machine-learning approach. In: Aref, W.G. (ed.) Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), Santa Barbara, California, pp. 509–520. ACM Press, New York (2001)
Chapter Google Scholar
Domshlak, C., Gal, A., Roitman, H.: Rank aggregation for automatic schema matching. IEEE Transactions on Knowledge and Data Engineering (TKDE) 19(4), 538–553 (2007)
Article Google Scholar
Ehrig, M., Staab, S., Sure, Y.: Bootstrapping ontology alignment methods with apfel. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 186–200. Springer, Heidelberg (2005)
Chapter Google Scholar
Euzenat, J., Mochol, M., Svab, O., Svatek, V., Shvaiko, P., Stuckenschmidt, H., van Hage, W., Yatskevich, M.: Introduction to the ontology alignment evaluation 2006. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, Springer, Heidelberg (2006)
Google Scholar
Gal, A.: Managing uncertainty in schema matching with top-k schema mappings. Journal of Data Semantics 6, 90–114 (2006)
Article Google Scholar
Gal, A., Anaby-Tavor, A., Trombetta, A., Montesi, D.: A framework for modeling and evaluating automatic semantic reconciliation. VLDB Journal 14(1), 50–67 (2005)
Article Google Scholar
Gal, A., Modica, G., Jamil, H., Eyal, A.: Automatic ontology matching using application semantics. AI Magazine 26(1), 21–32 (2005)
Google Scholar
He, B., Chang, K.-C.: Making holistic schema matching robust: an ensemble approach. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, August 21-24, 2005, pp. 429–438 (2005)
Google Scholar
Hull, R.: Managing semantic heterogeneity in databases: A theoretical perspective. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pp. 51–61. ACM Press, New York (1997)
Chapter Google Scholar
Lee, Y., Sayyadian, M., Doan, A., Rosenthal, A.: eTuner: tuning schema matching software using synthetic scenarios. VLDB Journal 16(1), 97–122 (2007)
Google Scholar
Li, W.-S., Clifton, C.: SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data & Knowledge Engineering 33(1), 49–84 (2000)
Article MATH Google Scholar
Madhavan, J., Bernstein, P., Domingos, P., Halevy, A.: Representing and reasoning about mappings between domain models. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI), pp. 80–86 (2002)
Google Scholar
Madhavan, J., Bernstein, P., Rahm, E.: Generic schema matching with Cupid. In: Proceedings of the International conference on Very Large Data Bases (VLDB), Rome, Italy, pp. 49–58 (September 2001)
Google Scholar
Marcoulides, G., Hershberger, S.: Multivariate Statistical Methods. Lawrence Erlbaum Associates, Mahwah (1997)
Google Scholar
Melnik, S.: Generic Model Management: Concepts and Algorithms. Springer, Heidelberg (2004)
MATH Google Scholar
Miller, R., Haas, L., Hernández, M.: Schema mapping as query discovery. In: Abbadi, A.E., Brodie, M., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., Whang, K.-Y. (eds.) Proceedings of the International conference on Very Large Data Bases (VLDB), pp. 77–88. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Nottelmann, H., Straccia, U.: Information retrieval and machine learning for probabilistic schema matching. Information Processing and Management 43(3), 552–576 (2007)
Article Google Scholar
Ross, S.: A First Course in Probability, 5th edn. Prentice-Hall, Englewood Cliffs (1997)
Google Scholar
Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal of Data Semantics 4, 146–171 (2005)
Article Google Scholar
Srivastava, B., Koehler, J.: Web service composition - Current solutions and open problems. In: Workshop on Planning for Web Services (ICAPS-2003), Trento, Italy (2003)
Google Scholar
Su, W., Wang, J., Lochovsky, F.: Aholistic schema matching for web query interfaces. In: Grust, T., Höpfner, H., Illarramendi, A., Jablonski, S., Mesiti, M., Müller, S., Patranjan, P.-L., Sattler, K.-U., Spiliopoulou, M., Wijsen, J. (eds.) EDBT 2006. LNCS, vol. 4254, pp. 77–94. Springer, Heidelberg (2006)
Chapter Google Scholar
Xu, L., Embley, D.: A composite approach to automating direct and indirect schema mappings. Information Systems 31(8), 697–886 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Technion – Israel Institute of Technology,
Anan Marie & Avigdor Gal

Authors

Anan Marie
View author publications
You can also search for this author in PubMed Google Scholar
Avigdor Gal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Henri Prade V. S. Subrahmanian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marie, A., Gal, A. (2007). Managing Uncertainty in Schema Matcher Ensembles. In: Prade, H., Subrahmanian, V.S. (eds) Scalable Uncertainty Management. SUM 2007. Lecture Notes in Computer Science(), vol 4772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75410-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-540-75410-7_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75407-7
Online ISBN: 978-3-540-75410-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics