Proximity Search of XML Data Using Ontology and XPath Edit Similarity

  • Toshiyuki Amagasa
  • Lianzi Wen
  • Hiroyuki Kitagawa
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4653)


XML data is explosively increasing, and a large amount of XML data, in which similar contents are described using different tag names and structures, have been emerging as a consequence. In such a situation, one cannot write a query against such XML data unless he/she knows the structure of the data. In this research, we propose a scheme to cope with this problem. Specifically, we expand XPath queries by replacing tag names with similar ones with the help of ontologies. In addition, we try to realize (structural) proximity matching of path expressions using edit similarity, which is a similarity measure based on edit distance. We also discuss application of SSJoin, which is an operator to support similarity joins in relational database systems, for speeding up the proposed scheme. We finally show the effectiveness of the proposed method by a series of experimentations.


Resource Description Framework Edit Distance SPARQL Query Path Expression XPath Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    W3C: Extensible Markup Language (XML) 1.0, 3rd edn., Recommendation (April 2004),
  2. 2.
    W3C: XML Path Language (XPath) Version 1.0. Recommendation (November 1999),
  3. 3.
    W3C: XSL Transformations (XSLT) Version 1.0. Recommendation (November 1999),
  4. 4.
    W3C: XQuery 1.0: An XML Query Language. Recommendation (January 2007),
  5. 5.
    Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: Proc. ICDE 2006, p. 5 (2006)Google Scholar
  6. 6.
    Cohen, W.W.: Data integration using similarity joins and a word-based information representation language. ACM Transactions on Information Systems (TOIS) 18(3), 288–321 (2000)CrossRefGoogle Scholar
  7. 7.
    Liang, W., Yokota, H.: A path-sequence based discrimination for subtree matching in approximate XML joins. In: Proc. The 2nd Int’l Special Workshop on Databases for Next-Generation Researchers (SWOD), p. 116 (2006)Google Scholar
  8. 8.
    Amer-Yahia, S., Cho, S., Srivastava, D.: Tree pattern relaxation. In: Jensen, C.S., Jeffery, K.G., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 496–513. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  9. 9.
    Zhang, K., Shasha, D.: 11. In: Tree pattern matching. Pattern Matching Algorithms, Oxford University Press, Oxford (1997)Google Scholar
  10. 10.
    WordNet a lexical database for the English language,
  11. 11.
    The Gene Ontology project,
  12. 12.
    RDF/OWL Representation of WordNet (2006),,/03/wn/wn20/
  13. 13.
    W3C: Resource Description Framework (RDF): Concepts and Abstract Syntax (February 2004) Recommendation (2004),,/REC-rdf-concepts-20040210/
  14. 14.
    W3C: SPARQL Query Language for RDF, Working Draft (October 2006),
  15. 15.
    Olteanu, D., Meuss, H., Furche, T., Bry, F.: XPath: Looking Forward. In: Chaudhri, A.B., Unland, R., Djeraba, C., Lindner, W. (eds.) EDBT 2002. LNCS, vol. 2490, pp. 109–127. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  16. 16.
    XBench – A Family of Benchmarks for XML DBMSs,

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Toshiyuki Amagasa
    • 1
    • 2
  • Lianzi Wen
    • 1
  • Hiroyuki Kitagawa
    • 1
    • 2
  1. 1.Graduate School of Systems and Information Engineering, Department of Computer Science 
  2. 2.Center for Computational Sciences, University of Tsukuba 1–1–1 Tennodai, Tsukuba 305–8573Japan

Personalised recommendations