Skip to main content

Managing Uncertainty in Schema Matcher Ensembles

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4772))

Abstract

Schema matching is the task of matching between concepts describing the meaning of data in various heterogeneous, distributed data sources. With many heuristics to choose from, several tools have enabled the use of schema matcher ensembles, combining principles by which different schema matchers judge the similarity between concepts. In this work, we investigate means of estimating the uncertainty involved in schema matching and harnessing it to improve an ensemble outcome. We propose a model for schema matching, based on simple probabilistic principles. We then propose the use of machine learning in determining the best mapping and discuss its pros and cons. Finally, we provide a thorough empirical analysis, using both real-world and synthetic data, to test the proposed technique. We conclude that the proposed heuristic performs well, given an accurate modeling of uncertainty in matcher decision making.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alagic, S., Bernstein, P.: A model theory for generic schema management. In: Ghelli, G., Grahne, G. (eds.) DBPL 2001. LNCS, vol. 2397, pp. 228–246. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Benerecetti, M., Bouquet, P., Zanobini, S.: Soundness of schema matching methods. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 211–225. Springer, Heidelberg (2005)

    Google Scholar 

  3. Berlin, J., Motro, A.: Autoplex: Automated discovery of content for virtual databases. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, pp. 108–122. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  4. Cudré-Mauroux, P., et al.: Viewpoints on emergent semantics. Journal on Data Semantics 6, 1–27 (2006)

    Article  Google Scholar 

  5. Dhamankar, R., Lee, Y., Doan, A., Halevy, A., Domingos, P.: imap: Discovering complex mappings between database schemas. In: Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), pp. 383–394. ACM Press, New York (2004)

    Chapter  Google Scholar 

  6. Do, H., Rahm, E.: COMA - a system for flexible combination of schema matching approaches. In: Proceedings of the International conference on Very Large Data Bases (VLDB), pp. 610–621 (2002)

    Google Scholar 

  7. Doan, A., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: A machine-learning approach. In: Aref, W.G. (ed.) Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), Santa Barbara, California, pp. 509–520. ACM Press, New York (2001)

    Chapter  Google Scholar 

  8. Domshlak, C., Gal, A., Roitman, H.: Rank aggregation for automatic schema matching. IEEE Transactions on Knowledge and Data Engineering (TKDE) 19(4), 538–553 (2007)

    Article  Google Scholar 

  9. Ehrig, M., Staab, S., Sure, Y.: Bootstrapping ontology alignment methods with apfel. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 186–200. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Euzenat, J., Mochol, M., Svab, O., Svatek, V., Shvaiko, P., Stuckenschmidt, H., van Hage, W., Yatskevich, M.: Introduction to the ontology alignment evaluation 2006. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L. (eds.) ISWC 2006. LNCS, vol. 4273, Springer, Heidelberg (2006)

    Google Scholar 

  11. Gal, A.: Managing uncertainty in schema matching with top-k schema mappings. Journal of Data Semantics 6, 90–114 (2006)

    Article  Google Scholar 

  12. Gal, A., Anaby-Tavor, A., Trombetta, A., Montesi, D.: A framework for modeling and evaluating automatic semantic reconciliation. VLDB Journal 14(1), 50–67 (2005)

    Article  Google Scholar 

  13. Gal, A., Modica, G., Jamil, H., Eyal, A.: Automatic ontology matching using application semantics. AI Magazine 26(1), 21–32 (2005)

    Google Scholar 

  14. He, B., Chang, K.-C.: Making holistic schema matching robust: an ensemble approach. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, August 21-24, 2005, pp. 429–438 (2005)

    Google Scholar 

  15. Hull, R.: Managing semantic heterogeneity in databases: A theoretical perspective. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pp. 51–61. ACM Press, New York (1997)

    Chapter  Google Scholar 

  16. Lee, Y., Sayyadian, M., Doan, A., Rosenthal, A.: eTuner: tuning schema matching software using synthetic scenarios. VLDB Journal 16(1), 97–122 (2007)

    Google Scholar 

  17. Li, W.-S., Clifton, C.: SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data & Knowledge Engineering 33(1), 49–84 (2000)

    Article  MATH  Google Scholar 

  18. Madhavan, J., Bernstein, P., Domingos, P., Halevy, A.: Representing and reasoning about mappings between domain models. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence (AAAI/IAAI), pp. 80–86 (2002)

    Google Scholar 

  19. Madhavan, J., Bernstein, P., Rahm, E.: Generic schema matching with Cupid. In: Proceedings of the International conference on Very Large Data Bases (VLDB), Rome, Italy, pp. 49–58 (September 2001)

    Google Scholar 

  20. Marcoulides, G., Hershberger, S.: Multivariate Statistical Methods. Lawrence Erlbaum Associates, Mahwah (1997)

    Google Scholar 

  21. Melnik, S.: Generic Model Management: Concepts and Algorithms. Springer, Heidelberg (2004)

    MATH  Google Scholar 

  22. Miller, R., Haas, L., Hernández, M.: Schema mapping as query discovery. In: Abbadi, A.E., Brodie, M., Chakravarthy, S., Dayal, U., Kamel, N., Schlageter, G., Whang, K.-Y. (eds.) Proceedings of the International conference on Very Large Data Bases (VLDB), pp. 77–88. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  23. Nottelmann, H., Straccia, U.: Information retrieval and machine learning for probabilistic schema matching. Information Processing and Management 43(3), 552–576 (2007)

    Article  Google Scholar 

  24. Ross, S.: A First Course in Probability, 5th edn. Prentice-Hall, Englewood Cliffs (1997)

    Google Scholar 

  25. Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal of Data Semantics 4, 146–171 (2005)

    Article  Google Scholar 

  26. Srivastava, B., Koehler, J.: Web service composition - Current solutions and open problems. In: Workshop on Planning for Web Services (ICAPS-2003), Trento, Italy (2003)

    Google Scholar 

  27. Su, W., Wang, J., Lochovsky, F.: Aholistic schema matching for web query interfaces. In: Grust, T., Höpfner, H., Illarramendi, A., Jablonski, S., Mesiti, M., Müller, S., Patranjan, P.-L., Sattler, K.-U., Spiliopoulou, M., Wijsen, J. (eds.) EDBT 2006. LNCS, vol. 4254, pp. 77–94. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  28. Xu, L., Embley, D.: A composite approach to automating direct and indirect schema mappings. Information Systems 31(8), 697–886 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Henri Prade V. S. Subrahmanian

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Marie, A., Gal, A. (2007). Managing Uncertainty in Schema Matcher Ensembles. In: Prade, H., Subrahmanian, V.S. (eds) Scalable Uncertainty Management. SUM 2007. Lecture Notes in Computer Science(), vol 4772. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75410-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75410-7_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75407-7

  • Online ISBN: 978-3-540-75410-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics