An Efficient Schema Matching Approach Using Previous Mapping Result Set

  • Hongjie Fan
  • Junfei LiuEmail author
  • Wenfeng Luo
  • Kejun Deng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9645)


The widespread adoption of eXtensible Markup Language pushed a growing number of researchers to design XML specific Schema Matching approaches, aiming at finding the semantic correspondence of concepts between different data sources. In the latest years, there has been a growing need for developing high performance matching systems in order to identify and discover such semantic correspondence across XML data. XML schema matching methods face several challenges in the form of definition, utilization, and combination of element similarity measures. In this paper, we propose the XML schema matching framework based on previous mapping result set (PMRS). We first parse XML schemas as schema trees and extract schema feature. Then we construct PMRS as the auxiliary information and conduct the retrieving algorithm based on PMRS. To cope with complex matching discovery, we compute the similarity among XML schemas semantic information carried by XML data. Our experimental results demonstrate the performance benefits of the schema matching framework using PMRS.


Schema matching XML Previous mapping result set 



This research is supported by The National Natural Science Foundation of China under Grant No. 61272159 and No. 61402125. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.


  1. 1.
    XML schema part 1: Structures.
  2. 2.
    De Meo, P., Quattrone, G., Terracina, G., Ursino, D.: Integration of XML schemas at various “severity” levels. Inf. Syst. 31(6), 397–434 (2006)CrossRefzbMATHGoogle Scholar
  3. 3.
    Hucka, M., et al.: The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19(4), 524–531 (2003)CrossRefGoogle Scholar
  4. 4.
    Ebxml website.
  5. 5.
    Murray, P.: Chemical markup language: a simple introduction to structured documents. World Wide Web J. 2(4), 135–147 (1997)Google Scholar
  6. 6.
  7. 7.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)CrossRefzbMATHGoogle Scholar
  8. 8.
    Do, H.H., Rahm, E.: COMA - a system for flexible combination of schema matching approaches. In: Proceedings of 28th International Conference on Very Large Data Bases. VLDB 2002, 20–23 August 2002, Hong Kong, China, pp. 610–621 (2002)Google Scholar
  9. 9.
    Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with COMA++. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, 14–16 June 2005, pp. 906–908 (2005)Google Scholar
  10. 10.
    Aberer, K., Franklin, M.J., Nishio, S. (eds.). Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, 5–8 April 2005, Tokyo, Japan. IEEE Computer Society (2005)Google Scholar
  11. 11.
    Seligman, L., Mork, P., Halevy, A.Y., Smith, K.P., Carey, M.J., Chen, K., Wolf, C., Madhavan, J., Kannan, A., Burdick, D.: Openii: an open source information integration toolkit. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. SIGMOD 2010, Indianapolis, Indiana, USA, 6–10 June 2010, pp. 1057–1060 (2010)Google Scholar
  12. 12.
    Saha, B., Stanoi, I., Clarkson, K.L.: Schema covering: a step towards enabling reuse in information integration. In: Proceedings of the 26th International Conference on Data Engineering. ICDE 2010, 1–6 March 2010, Long Beach, California, USA, pp. 285–296 (2010)Google Scholar
  13. 13.
    Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. In: Soviet physics doklady, pp. 707–710 (1966)Google Scholar
  14. 14.
    Marzal, A., Vidal, E.: Computation of normalized edit distance and applications. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 926–932 (1993)CrossRefGoogle Scholar
  15. 15.
    Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)CrossRefGoogle Scholar
  16. 16.
    Cohen, W.W., Ravikumar, P.D., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb 2003), 9–10 August 2003, Acapulco, Mexico, pp. 73–78 (2003)Google Scholar
  17. 17.
    Formica, A.: Similarity of XML-schema elements: a structural and information content approach. Comput. J. 51(2), 240–254 (2008)CrossRefGoogle Scholar
  18. 18.
    Joachims, T.: A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, USA, 8–12 July 1997, pp. 143–151 (1997)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Hongjie Fan
    • 1
  • Junfei Liu
    • 2
    Email author
  • Wenfeng Luo
    • 1
  • Kejun Deng
    • 1
  1. 1.School of Electronics Engineering and Computer SciencePeking UniversityBeijingChina
  2. 2.National Engineering Research Center for Software EngineeringPeking UniversityBeijingChina

Personalised recommendations