Skip to main content

Fast Approximate Matching Between XML Documents and Schemata

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3841))

Abstract

XML has become the standard format for web publishing and data exchange on the Internet. Much research has been done to provide efficient access to relevant information that is ubiquitous on the Web. In this paper, we present an algorithm to find a sequence of top-down edit operations with minimum cost that transforms an XML document such that it conforms to a schema. The minimum cost is based on the tree edit distance with top-down edit operations. It is shown that the algorithm runs in O(p × log p × n), where p is the size of the schema(grammar) and n is the size of the XML document(tree).

Experimental studies have also shown that the running time of our algorithm is linear with respect to the size of the XML document when normalized regular hedge grammar is used to specify a schema.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Suzuki, N.: Finding an Optimum Edit Script between an XML Document and a DTD. In: Proceedings of ACM Symposium on Applied Computing, Santa Fe, NM, pp. 647–653 (March 2005)

    Google Scholar 

  2. Canfield, R., Xing, G.: Approximate XML Document Matching (Poster). In: Proceedings of ACM Symposium on Applied Computing, Santa Fe, NM (March 2005)

    Google Scholar 

  3. Bray, T., Paoli, J., Sperberg-McQueen, M., et al.: Extensible Markup Language (XML) 1.0. W3C, 3rd edn., http://www.w3.org/TR/2004/REC-xml-20040204/

  4. Shasha, D., Zhang, K.: Approximate Tree Pattern Matching. In: Apostolico, A., Galil, Z. (eds.) Pattern Matching Algorithms, ch. 14. Oxford University Press, Oxford (June 1997)

    Google Scholar 

  5. Shasha, D., Zhang, K.: Fast algorithms for the unit cost editing distance between trees. Journal of Algorithms 11, 581–621 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  6. Tanaka, E., Tanaka, K.: The Tree-to-tree Editing Problem. International Journal of Pattern Recognition and Artificial Intelligence 2(2), 221–240 (1988)

    Article  Google Scholar 

  7. Courcelle, B.: On recognizable sets and tree automata. In: Nivat, M., Ait-Kaci, H. (eds.) Resolution of Equations in Algebraic Structures. Academic Press, London (1989)

    Google Scholar 

  8. Murata, M.: Hedge Automata: A Formal Model for XML Schemata, http://www.xml.gr.jp/relax/hedge_nice.html

  9. Myers, G.: Approximately Matching Context Free Languages. Information Processing Letters 54(2), 85–92 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  10. Bertino, E., Guerrini, G., Mesiti, M.: A Matching Algorithm for Measuring the Structural Similarity Between an XML document and a DTD and its Applications. Information Systems 29, 23–46 (2004)

    Article  MathSciNet  Google Scholar 

  11. Boukottaya, A., Vanoirbeek, C., Paganelli, F., Abou Khaled, O.: Automating XML Documents Transformations: A Conceptual Modelling Based Approach. In: Proceedings of 1st Asian-Pacific conference on Conceptual modelling, Dunedin, New Zealand, vol. 31, pp. 81–90 (2004)

    Google Scholar 

  12. de Castro Reis, D., Golgher, P.B., da Silva, A.S., Laender, A.H.F.: Automatic web news extraction using tree edit distance. In: WWW 2004, Manhattan, NY, pp. 502–511 (2004)

    Google Scholar 

  13. Selkow, S.M.: The Tree-to-Tree Editing Problem. Information Processing Letters 6, 184–186 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  14. Chen, W.: New Algorithm for Ordered Tree-to-Tree Correction Problem. Journal of Algorithms 40, 135–158 (2001)

    Article  MATH  Google Scholar 

  15. Garofalakis, M., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: Xtract: A System For Extracting Document Type Descriptors From XML Documents. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 165–176 (2000)

    Google Scholar 

  16. Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: Proceedings of WebDB 2002, Madison, Wisconsin (June 2002)

    Google Scholar 

  17. Schlieder, T.: Similarity Search in XML Data using Cost-Based Query Transformations. In: Proceedings of WebDB 2001, pp. 19–24 (2001)

    Google Scholar 

  18. Schmidt, A.R., Waas, F., Kersten, M.L., Florescu, D., Manolescu, I., Carey, M.J., Busse, R.: The XML Benchmark Project. Technical Report INS-R0103, CWI, Amsterdam, The Netherlands (April 2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xing, G. (2006). Fast Approximate Matching Between XML Documents and Schemata. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_38

Download citation

  • DOI: https://doi.org/10.1007/11610113_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31142-3

  • Online ISBN: 978-3-540-32437-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics