Skip to main content

Towards a General Framework for Effective Solutions to the Data Mapping Problem

  • Chapter
Journal on Data Semantics XIV

Part of the book series: Lecture Notes in Computer Science ((JODS,volume 5880))

Abstract

Automating the discovery of mappings between structured data sources is a long standing and important problem in data management. We discuss the rich history of the problem and the variety of technical solutions advanced in the database community over the previous four decades. Based on this discussion, we develop a basic statement of the data mapping problem and a general framework for reasoning about the design space of system solutions to the problem. We then concretely illustrate the framework with the Tupelo system for data mapping discovery, focusing on the important common case of relational data sources. Treating mapping discovery as example-driven search in a space of transformations, Tupelo generates queries encompassing the full range of structural and semantic heterogeneities encountered in relational data mapping. Hence, Tupelo is applicable in a wide range of data mapping scenarios. Finally, we present the results of extensive empirical validation, both on synthetic and real world datasets, indicating that the system is both viable and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)

    MATH  Google Scholar 

  2. Agrawal, R., Somani, A., Xu, Y.: Storage and Querying of E-Commerce Data. In: VLDB, Rome, Italy, pp. 149–158 (2001)

    Google Scholar 

  3. Batini, C., Lenzerini, M., Navathe, S.B.: A Comparative Analysis of Methodologies for Database Schema Integration. ACM Comput. Surv. 18(4), 323–364 (1986)

    Article  Google Scholar 

  4. Bernstein, P.A., Melnik, S., Mork, P.: Interactive Schema Translation with Instance-Level Mappings. In: VLDB, Trondheim, Norway, pp. 1283–1286 (2005)

    Google Scholar 

  5. Berry, M.W., Drmač, Z., Jessup, E.R.: Matrices, Vector Spaces, and Information Retrieval. SIAM Review 41(2), 335–362 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  6. Bilke, A., Naumann, F.: Schema Matching using Duplicates. In: IEEE ICDE, Tokyo, Japan, pp. 69–80 (2005)

    Google Scholar 

  7. Bohannon, P., Elnahrawy, E., Fan, W., Flaster, M.: Putting Context into Schema Matching. In: VLDB, Seoul, Korea, pp. 307–318 (2006)

    Google Scholar 

  8. Bossung, S., Stoeckle, H., Grundy, J.C., Amor, R., Hosking, J.G.: Automated Data Mapping Specification via Schema Heuristics and User Interaction. In: IEEE ASE, Linz, Austria, pp. 208–217 (2004)

    Google Scholar 

  9. Calvanese, D., Giacomo, G.D., Lenzerini, M., Rosati, R.: Logical Foundations of Peer-To-Peer Data Integration. In: ACM PODS, Paris, France, pp. 241–251 (2004)

    Google Scholar 

  10. Carreira, P., Galhardas, H.: Execution of Data Mappers. In: ACM SIGMOD Workshop IQIS, Paris, France, pp. 2–9 (2004)

    Google Scholar 

  11. Dalvi, N.N., Suciu, D.: Management of Probabilistic Data: Foundations and Challenges. In: PODS, Beijing, pp. 1–12 (2007)

    Google Scholar 

  12. Dhamankar, R., Lee, Y., Doan, A., Halevy, A.Y., Domingos, P.: iMAP: Discovering Complex Mappings between Database Schemas. In: ACM SIGMOD, Paris, France, pp. 383–394 (2004)

    Google Scholar 

  13. Doan, A., Domingos, P., Halevy, A.: Learning to Match the Schemas of Databases: A Multistrategy Approach. Machine Learning 50(3), 279–301 (2003)

    Article  MATH  Google Scholar 

  14. Doan, A., Noy, N.F., Halevy, A.Y.: Special Issue on Semantic Integration. SIGMOD Record 33(4) (2004)

    Google Scholar 

  15. Eco, U.: The Search for the Perfect Language. Blackwell, Oxford (1995)

    Google Scholar 

  16. Euzenat, J., et al.: State of the Art on Ontology Alignment. Technical Report D2.2.3, IST Knowledge Web NoE (2004)

    Google Scholar 

  17. Feng, Y., Goldstone, R.L., Menkov, V.: A Graph Matching Algorithm and its Application to Conceptual System Translation. Int. J. AI Tools 14(1-2), 77–100 (2005)

    Article  Google Scholar 

  18. Fletcher, G.H.L., Gyssens, M., Paredaens, J., Van Gucht, D.: On the Expressive Power of the Relational Algebra on Finite Sets of Relation Pairs. IEEE Trans. Knowl. Data Eng. 21(6), 939–942 (2009)

    Article  Google Scholar 

  19. Fletcher, G.H.L., Wyss, C.M.: Mapping Between Data Sources on the Web. In: IEEE WIRI, Tokyo, Japan, pp. 173–178 (2005)

    Google Scholar 

  20. Fletcher, G.H.L., Wyss, C.M.: Data Mapping as Search. In: Ioannidis, Y., Scholl, M.H., Schmidt, J.W., Matthes, F., Hatzopoulos, M., Böhm, K., Kemper, A., Grust, T., Böhm, C. (eds.) EDBT 2006. LNCS, vol. 3896, pp. 95–111. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  21. Fletcher, G.H.L., Wyss, C.M., Robertson, E.L., Van Gucht, D.: A Calculus for Data Mapping. ENTCS 150(2), 37–54 (2006)

    Google Scholar 

  22. Gal, A.: On the Cardinality of Schema Matching. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM-WS 2005. LNCS, vol. 3762, pp. 947–956. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  23. Gal, A.: Why is Schema Matching Tough and What Can We Do About It?. SIGMOD Record 35(4), 2–5 (2006)

    Article  Google Scholar 

  24. Garcia-Molina, H.: Web Information Management: Past, Present, Future. In: ACM WSDM, Palo Alto, CA (2008)

    Google Scholar 

  25. Gillis, J., Van den Bussche, J.: Induction of relational algebra expressions. In: ILP, Leuven (2009)

    Google Scholar 

  26. Giunchiglia, F., Shvaiko, P.: Semantic Matching. Knowledge Eng. Review 18(3), 265–280 (2003)

    Article  Google Scholar 

  27. Goguen, J.A.: Information Integration in Institutions. In: Moss, L. (ed.) Jon Barwise Memorial Volume. Indiana University Press (2006)

    Google Scholar 

  28. Gottlob, G., Koch, C., Baumgartner, R., Herzog, M., Flesca, S.: The Lixto Data Extraction Project - Back and Forth between Theory and Practice. In: ACM PODS, Paris, France, pp. 1–12 (2004)

    Google Scholar 

  29. Grahne, G., Kiricenko, V.: Towards an Algebraic Theory of Information Integration. Information and Computation 194(2), 79–100 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  30. Grundy, J.C., Hosking, J.G., Amor, R., Mugridge, W.B., Li, Y.: Domain-Specific Visual Languages for Specifying and Generating Data Mapping Systems. J. Vis. Lang. Comput. 15(3-4), 243–263 (2004)

    Article  Google Scholar 

  31. Haas, L.M., Hernández, M.A., Ho, H., Popa, L., Roth, M.: Clio Grows Up: From Research Prototype to Industrial Tool. In: ACM SIGMOD, Baltimore, MD, pp. 805–810 (2005)

    Google Scholar 

  32. Habegger, B.: Mapping a Database into an Ontology: a Relational Learning Approach. In: IEEE ICDE, Istanbul, pp. 1443–1447 (2007)

    Google Scholar 

  33. Harris, R.: The Language Connection: Philosophy and Linguistics. Thoemmes Press, Bristol (1997)

    Google Scholar 

  34. He, B., Chang, K.C.-C., Han, J.: Discovering Complex Matchings Across Web Query Interfaces: a Correlation Mining Approach. In: ACM KDD, Seattle, WA, pp. 148–157 (2004)

    Google Scholar 

  35. Hernández, M.A., Papotti, P., Tan, W.-C.: Data Exchange with Data-Metadata Translations. In: VLDB, Auckland, New Zealand (2008)

    Google Scholar 

  36. Hull, R.: Managing Semantic Heterogeneity in Databases: a Theoretical Perspective. In: ACM PODS, Tucson, AZ, pp. 51–61 (1997)

    Google Scholar 

  37. Jain, M.K., Mendhekar, A., Van Gucht, D.: A Uniform Data Model for Relational Data and Meta-Data Query Processing. In: COMAD, Pune, India (1995)

    Google Scholar 

  38. Kalfoglou, Y., Schorlemmer, M.: Ontology Mapping: the State of the Art. Knowledge Eng. Review 18(1), 1–31 (2003)

    Article  Google Scholar 

  39. Kashyap, V., Sheth, A.: Semantic and Schematic Similarities Between Database Objects: A Context-Based Approach. VLDB J. 5(4), 276–304 (1996)

    Article  Google Scholar 

  40. Kedad, Z., Xue, X.: Mapping Discovery for XML Data Integration. In: Meersman, R., Tari, Z. (eds.) OTM 2005. LNCS, vol. 3760, pp. 166–182. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  41. Kementsietsidis, A., Arenas, M., Miller, R.J.: Mapping Data in Peer-to-Peer Systems: Semantics and Algorithmic Issues. In: ACM SIGMOD, San Diego, CA, pp. 325–336 (2003)

    Google Scholar 

  42. Kent, W.: The Unsolvable Identity Problem. In: Extreme Markup Languages, Montréal, Quebec, Canada (2003)

    Google Scholar 

  43. Kim, W., Seo, J.: Classifying Schematic and Data Heterogeneity in Multidatabase Systems. IEEE Computer 24(12), 12–18 (1991)

    Google Scholar 

  44. Kolaitis, P.G.: Schema Mappings, Data Exchange, and Metadata Management. In: ACM PODS, Baltimore, MD, pp. 61–75 (2005)

    Google Scholar 

  45. Korf, R.E.: Depth-First Iterative-Deepening: An Optimal Admissible Tree Search. Artif. Intell. 27(1), 97–109 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  46. Korf, R.E.: Linear-Space Best-First Search. Artif. Intell. 62(1), 41–78 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  47. Krishnamurthy, R., Litwin, W., Kent, W.: Language Features for Interoperability of Databases with Schematic Discrepancies. In: ACM SIGMOD, Denver, CO, pp. 40–49 (1991)

    Google Scholar 

  48. Lenzerini, M.: Data Integration: A Theoretical Perspective. In: ACM PODS, Madison, WI, pp. 233–246 (2002)

    Google Scholar 

  49. Levenshtein, V.I.: Dvoichnye Kody s Ispravleniem Vypadenii, Vstavok i Zameshchenii Simvolov. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)

    MathSciNet  Google Scholar 

  50. Levy, A.Y., Ordille, J.J.: An Experiment in Integrating Internet Information Sources. In: AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval, Cambridge, MA, pp. 92–96 (1995)

    Google Scholar 

  51. Li, W.-S., Clifton, C.: SEMINT: A Tool for Identifying Attribute Correspondences in Heterogeneous Databases Using Neural Networks. Data & Knowl. Eng. 33(1), 49–84 (2000)

    Article  MATH  Google Scholar 

  52. Litwin, W.: Bridging a Great Divide: Past, Present, and Future in Multidatabase Interoperability. In: InterDB, Namur, Belgium (2005)

    Google Scholar 

  53. Litwin, W., Ketabchi, M.A., Krishnamurthy, R.: First Order Normal Form for Relational Databases and Multidatabases. SIGMOD Record 20(4), 74–76 (1991)

    Article  Google Scholar 

  54. Litwin, W., Mark, L., Roussopoulos, N.: Interoperability of Multiple Autonomous Databases. ACM Comput. Surv. 22(3), 267–293 (1990)

    Article  Google Scholar 

  55. Matuszek, C., Cabral, J., Witbrockand, M., DeOliveira, J.: An Introduction to the Syntax and Content of Cyc. In: Baral, C. (ed.) Technical Report SS-06-05, pp. 44–49. AAAI Press, Menlo Park (2006)

    Google Scholar 

  56. Melnik, S.: Generic Model Management: Concepts and Algorithms. Springer, Berlin (2004)

    MATH  Google Scholar 

  57. Melnik, S., Bernstein, P.A., Halevy, A.Y., Rahm, E.: Supporting Executable Mappings in Model Management. In: ACM SIGMOD, Baltimore, MD, pp. 167–178 (2005)

    Google Scholar 

  58. Miller, R.J.: Using Schematically Heterogeneous Structures. In: ACM SIGMOD, Seattle, WA, pp. 189–200 (1998)

    Google Scholar 

  59. Miller, R.J., Haas, L.M., Hernández, M.A.: Schema Mapping as Query Discovery. In: VLDB, Cairo, Egypt, pp. 77–88 (2000)

    Google Scholar 

  60. Morishima, A., Kitagawa, H., Matsumoto, A.: A Machine Learning Approach to Rapid Development of XML Mapping Queries. In: IEEE ICDE, Boston, MA, pp. 276–287 (2004)

    Google Scholar 

  61. Nilsson, N.J.: Artificial Intelligence: A New Synthesis. Morgan Kaufmann, San Francisco (1998)

    MATH  Google Scholar 

  62. Noy, N.F., Doan, A., Halevy, A.Y.: Special Issue on Semantic Integration. AI Magazine 26(1) (2005)

    Google Scholar 

  63. Perkowitz, M., Doorenbos, R.B., Etzioni, O., Weld, D.S.: Learning to Understand Information on the Internet: An Example-Based Approach. J. Intell. Inf. Syst. 8(2), 133–153 (1997)

    Article  Google Scholar 

  64. Rahm, E., Bernstein, P.A.: A Survey of Approaches to Automatic Schema Matching. VLDB J. 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  65. Raman, V., Hellerstein, J.M.: Potter’s Wheel: An Interactive Data Cleaning System. In: VLDB, Roma, Italy, pp. 381–390 (2001)

    Google Scholar 

  66. Schmid, U., Waltermann, J.: Automatic Synthesis of XSL-Transformations from Example Documents. In: IASTED AIA, Innsbruck, Austria, pp. 252–257 (2004)

    Google Scholar 

  67. Sheth, A.P., Larson, J.A.: Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Comput. Surv. 22(3), 183–236 (1990)

    Article  Google Scholar 

  68. Shu, N.C., Housel, B.C., Taylor, R.W., Ghosh, S.P., Lum, V.Y.: EXPRESS: a Data EXtraction, Processing, and Restructuring System. ACM Trans. Database Syst. 2(2), 134–174 (1977)

    Article  Google Scholar 

  69. Shvaiko, P., Euzenat, J.: A Survey of Schema-Based Matching Approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  70. Stuckenschmidt, H., van Harmelen, F.: Information Sharing on the Semantic Web. Springer, Berlin (2005)

    MATH  Google Scholar 

  71. Wache, H., Vögele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., Hübner, S.: Ontology-based integration of information – a survey of existing approaches. In: IJCAI (2001)

    Google Scholar 

  72. Wang, G., Goguen, J.A., Nam, Y.-K., Lin, K.: Critical Points for Interactive Schema Matching. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 654–664. Springer, Heidelberg (2004)

    Google Scholar 

  73. Warren, R.H., Tompa, F.W.: Multi-Column Substring Matching for Database Schema Translation. In: VLDB, Seoul, Korea, pp. 331–342 (2006)

    Google Scholar 

  74. Wiederhold, G.: The Impossibility of Global Consistency. OMICS 7(1), 17–20 (2003)

    Article  Google Scholar 

  75. Wiesman, F., Roos, N.: Domain Independent Learning of Ontology Mappings. In: AAMAS, New York, NY, pp. 846–853 (2004)

    Google Scholar 

  76. Winkler, W.E.: The State of Record Linkage and Current Research Problems. Technical Report RR99/04, U.S. Bureau of the Census, Statistical Research Division (1999)

    Google Scholar 

  77. Wyss, C.M., Robertson, E.L.: A Formal Characterization of PIVOT/UNPIVOT. In: ACM CIKM, Bremen, Germany, pp. 602–608 (2005)

    Google Scholar 

  78. Wyss, C.M., Robertson, E.L.: Relational Languages for Metadata Integration. ACM Trans. Database Syst. 30(2), 624–660 (2005)

    Article  Google Scholar 

  79. Wyss, C.M., Van Gucht, D.: A Relational Algebra for Data/Metadata Integration in a Federated Database System. In: ACM CIKM, Atlanta, GA, USA, pp. 65–72 (2001)

    Google Scholar 

  80. Wyss, C.M., Wyss, F.I.: Extending Relational Query Optimization to Dynamic Schemas for Information Integration in Multidatabases. In: ACM SIGMOD, Beijing (2007)

    Google Scholar 

  81. Xu, L., Embley, D.W.: A Composite Approach to Automating Direct and Indirect Schema Mappings. Information Systems 31(8), 697–732 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Fletcher, G.H.L., Wyss, C.M. (2009). Towards a General Framework for Effective Solutions to the Data Mapping Problem. In: Spaccapietra, S., Delcambre, L. (eds) Journal on Data Semantics XIV. Lecture Notes in Computer Science, vol 5880. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10562-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10562-3_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10561-6

  • Online ISBN: 978-3-642-10562-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics