Skip to main content
Log in

A Framework for Efficient Matching of Large-Scale Metadata Models

  • Research Article - Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Despite the success achieved in the metadata models matching area, large-scale matching does not preserve high match quality and efficiency at the same time. To deal with these challenges, we introduce a generic matching framework, called MetMat, to identify and discover corresponding entities across XML schemas and/or ontologies (metadata models). In particular, the proposed framework is based on a parallelized clustering-based matching approach, which first splits the original matching task into smaller independent tasks. These independent tasks are then carried out in parallel exploiting desktop platform features that are equipped with parallelism enabled multi-core processors. To this end, we develop three different parallel strategies: inter-, intra-, and hybrid-matching strategies. To obtain high quality, a set of matchers are exploited. The proposed framework is validated through an extensive set of experiments over small and large data sets. We also compared the MetMat framework to top matching tools participating in the OAEI (Ontology Alignment Evaluation Initiative) (http://oaei.ontologymatching.org/) for the last three years. The results show that the MetMat framework with the intra-parallel matching strategy outperforms other matching strategies in terms of processing time while preserving the same quality. Moreover, the tool acquires a good position through OAEI for the last three years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ivanov, P.; Voigt, K.: Schema, ontology and metamodel matching—different, but indeed the same? In: Model and Data Engineering—First International Conference, MEDI 2011, Óbidos, Portugal, September 28–30, 2011. Proceedings, pp. 18–30. (2011)

  2. Voigt, K.: Structural Graph-Based Metamodel Matching. Ph.D. thesis, Technischen Universität Dresden (2011)

  3. Giunchiglia, F.; Shvaiko, P.: Semantic matching. Knowl. Rev. J. 18(3), 265–280 (2004)

    Article  Google Scholar 

  4. Agreste, S.; Meo, P.D.; Ferrara, E.; Ursino, D.: XML matchers: approaches and challenges. Knowl. Based Syst. 66, 190–209 (2014)

    Article  Google Scholar 

  5. Bellahsene, Z.; Bonifati, A.; Rahm, E.: Schema Matching and Mapping. Springer, Heidelberg (2011)

    Book  MATH  Google Scholar 

  6. Bernstein, P.; Madhavan, J.; Rahm, E.: Generic schema matching, pp. 695–701. In: Ten Years, Proceedings of the VLDB Endowment (2011)

  7. Ehrig, M.: Ontology Alignment: Bridging the Semantic Gap. Springer, New York (2007)

    Google Scholar 

  8. Rahm, E.; Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  9. Babalou, S.; Kargar, M.J.; Davarpanah, S.H.: Large-scale ontology matching: a review of the literature. In: Second International Conference on Web Research (ICWR), pp. 158–165. (2016)

  10. Do, H.H.; Rahm, E.: Matching large schemas: approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)

    Article  Google Scholar 

  11. Hamdi, F.; Safar, B.; Reynaud, C.; Zargayouna, H.: Alignment-based partitioning of large-scale ontologies. Adv. Knowl. Discov. Manage. 292, 251–269 (2010)

    Article  MATH  Google Scholar 

  12. Rahm, E.: Towards large-scale schema and ontology matching. In: Data-Centric Systems and Applications, pp. 3–27. Springer (2011)

  13. Wang, Z.; Wang, Y.; Zhang, S.; Shen, G.; Du, T.: Matching large scale ontology effectively. In: ASWC 2006, LNCS 4185, pp. 99–105 (2006)

  14. Doan, A.; Halevy, A.Y.; Ives, Z.G.: Principles of Data Integration. Morgan Kaumann, Boston (2012)

    Google Scholar 

  15. Algergawy, A.; Nayak, R.; Siegmund, N.; Koppen, V.; Saake, G.: Combining schema and level-based matching for web service discovery. In: 10th International Conference on Web Engineering, pp. 114–128. Springer (2010)

  16. Caruccio, L.; Polese, G.; Tortora, G.: Synchronization of queries and views upon schema evolutions: a survey. ACM Trans. Database Syst. 41(2), 9:1–9:41 (2016)

    Article  MathSciNet  Google Scholar 

  17. Zablith, F.; Antoniou, G.; d’Aquin, M.; Flouris, G.; Kondylakis, H.; Motta, E.; Plexousakis, D.; Sabou, M.: Ontology evolution: a process centric survey. Knowl. Eng. Rev. 30(1), 45–75 (2013)

    Article  Google Scholar 

  18. Otero-Cerdeira, L.; Rodríguez-Martínez, F.J.; Gómez-Rodríguez, A.: Ontology matching: a literature review. Expert Syst. Appl. 42(2), 949–971 (2015)

    Article  Google Scholar 

  19. Shvaiko, P.; Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25(1), 158–176 (2013)

    Article  Google Scholar 

  20. Pei, J.; Hong, J.; Bell, D.A.: A novel clustering-based approach to schema matching. In: Advances in Information Systems, 4th International Conference, ADVIS, pp. 60–69. (2006)

  21. Algergawy, A.; Massmann, S.; Rahm, E.: A clustering-based approach for large scale ontology matching. In: Advances in Databases and Information Systems, pp. 415–428. (2011)

  22. Algergawy, A.; Babalou, S.; Kargar, M.J.; Davarpanah, S.H.: SeeCOnt: a new seeding-based clustering approach for ontology matching. In: 19th International Conference on Advances in Databases and Information Systems, ADBIS, pp. 245–258. (2015)

  23. Aumuller, D.; Do, H.H.; Massmann, S.; Rahm, E.: Schema and ontology matching with COMA++. In The 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908. Maryland, USA (2005)

  24. Hu, W.; Qu, Y.; Cheng, G.: Matching large ontologies: a divide-and-conquer approach. Data Knowl. Eng. 67, 140–160 (2008)

    Article  Google Scholar 

  25. Grau, B.C.; Parsia, B.; Sirin, E.; Kalyanpur, A.: Automatic partitioning of OWL ontologies using E-connections. In: Proceedings of the 2005 International Workshop on Description Logics (DL2005), Edinburgh, Scotland, UK, July 26–28. (2005)

  26. Garcia, A.C.; Tiveron, L.; Justel, C.M.; Cavalcanti, M.C.: Applying graph partitioning techniques to modularize large ontologies. In: Proceedings of Joint V Seminar on Ontology Research in Brazil and VII International Workshop on Metamodels, Ontologies and Semantic Technologies, pp. 72–83. (2012)

  27. Jiménez-Ruiz, E.; Grau, B.C.: LogMap: logic-based and scalable ontology matching. In: 10th International Semantic Web Conference-ISWC 2011, pp. 273–288. (2011)

  28. Doran, P.; Tamma, V.A.M.; Iannone, L.: Ontology module extraction for ontology reuse: an ontology engineering perspective. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM, pp. 61–70. (2007)

  29. Santos, E.; Faria, D.; Pesquita, C.; Couto, F.M.: Ontology alignment repair through modularization and confidence-based heuristics. PLoS ONE 10(12), e0144807 (2015)

    Article  Google Scholar 

  30. Melnik, S.; Garcia-Molina, H.; Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: ICDE’02. (2002)

  31. Seddiquia, M.H.; Aono, M.: An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size. Web Semantics 7(4), 344–356 (2009)

    Article  Google Scholar 

  32. Kirsten, T.; Groß, A.; Hartung, M.; Rahm, E.: GOMMA: a component-based infrastructure for managing and analyzing life science ontologies and their evolution. J. Biomed. Semantics 2, 6 (2011)

    Article  Google Scholar 

  33. Ngo, D.; Bellahsene, Z.: YAM++: a multi-strategy based approach for ontology matching task. In: EKAW’12 Proceedings of the 18th International Conference on Knowledge Engineering and Knowledge Management, pp. 421–425. (2012)

  34. Zhong, Q.; Li, H.; Li, J.; Xie, G.T.; Tang, J.; Zhou, L.; Pan, Y.: A Gauss function based approach for unbalanced ontology matching. In: 2009 ACM SIGMOD International Conference on Management of data, pp. 669–680. (2009)

  35. Gross, A.; Hartung, M.; Kirsten, T.; Rahm, E.: On matching large life science ontologies in parallel. In: 7th International Conference on Data Integration in the Life Sciences, pp. 35–49. (2010)

  36. Amin, M.B.; Khan, W.A.; Lee, S.; Kang, B.H.: Performance-based ontology matching—a data-parallel approach for an effectiveness-independent performance-gain in ontology matching. Appl. Intell. 43(2), 356–385 (2015)

    Article  Google Scholar 

  37. Torre-Bastida, A.I.; Villar-Rodriguez, E.; Ser, J.D.; Camacho, D.; Rodríguez, M.G.: On interlinking linked data sources by using ontology matching techniques and the map-reduce framework. In: IDEAL, volume 8669 of Lecture Notes in Computer Science, pp. 53–60. Springer (2014)

  38. Algergawy, A.; Nayak, R.; Saake, G.: Element similarity measures in XML schema matching. Inf. Sci. 180(24), 4975–4998 (2010)

    Article  Google Scholar 

  39. Miller, G.: WordNet. A lexical database for English. Commun. ACM Mag. 38(11), 39–41 (1995)

    Article  Google Scholar 

  40. Algergawy, A.; Moawed, S.; Sarhan, A.; Eldosouky, A.; Saake, G.: Improving clustering-based schema matching using latent semantic indexing. In: Transactions on Large-Scale Data-and Knowledge-Centered Systems XV, pp. 102–123. (2014)

  41. Euzenat, J.; Shvaiko, P.: Ontology Matching, 2nd edn. Springer, Heidelberg (DE) (2013)

    Book  MATH  Google Scholar 

  42. Cohen, W.; Ravikumar, P.; Fienberg, S.: A comparsion of string distance metrics for name-matching tasks. In: Proceedings of IJCAI-03 Workshop on Information Integration on the Web, IIWeb-03, AAAI (2003), pp. 73–78. (2003)

  43. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

    Article  Google Scholar 

  44. Thu, T.P.T.: Hybrid Similarity Measure for XML Data Integration and Transformation. Ph.D. thesis, Seoul, Korea (2012)

  45. Algergawy, A.: Management of XML Data by Means of Schema Matching. Ph.D. thesis, Otto von Guericke University Magdeburg (2010)

  46. Gonzalez, J .F.; Fernandez, J.: Java 7 Concurrency Cookbook. Packt Publishing Ltd., Birmingham (2012)

    Google Scholar 

  47. Anderson, T.; Bershad, B.; Lazowska, E.; Levy, H.: Thread management for shared-memory multiprocessors. In: Computing Handbook, Third Edition: Computer Science and Software Engineering, vol. 53, pp. 1–12 (2014)

Download references

Acknowledgements

A. Algergawy work has been funded by the Deutsche Forschungsgemeinschaft (DFG) as part of the CRC 1076 AquaDiva.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alsayed Algergawy.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moawed, S., Algergawy, A., Sarhan, A. et al. A Framework for Efficient Matching of Large-Scale Metadata Models. Arab J Sci Eng 44, 3117–3135 (2019). https://doi.org/10.1007/s13369-018-3443-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-018-3443-4

Keywords

Navigation