Fast Approximate Matching Between XML Documents and Schemata

Xing, Guangming

doi:10.1007/11610113_38

Fast Approximate Matching Between XML Documents and Schemata

Guangming Xing²¹

Conference paper

836 Accesses
7 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3841))

Abstract

XML has become the standard format for web publishing and data exchange on the Internet. Much research has been done to provide efficient access to relevant information that is ubiquitous on the Web. In this paper, we present an algorithm to find a sequence of top-down edit operations with minimum cost that transforms an XML document such that it conforms to a schema. The minimum cost is based on the tree edit distance with top-down edit operations. It is shown that the algorithm runs in O(p × log p × n), where p is the size of the schema(grammar) and n is the size of the XML document(tree).

Experimental studies have also shown that the running time of our algorithm is linear with respect to the size of the XML document when normalized regular hedge grammar is used to specify a schema.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Suzuki, N.: Finding an Optimum Edit Script between an XML Document and a DTD. In: Proceedings of ACM Symposium on Applied Computing, Santa Fe, NM, pp. 647–653 (March 2005)
Google Scholar
Canfield, R., Xing, G.: Approximate XML Document Matching (Poster). In: Proceedings of ACM Symposium on Applied Computing, Santa Fe, NM (March 2005)
Google Scholar
Bray, T., Paoli, J., Sperberg-McQueen, M., et al.: Extensible Markup Language (XML) 1.0. W3C, 3rd edn., http://www.w3.org/TR/2004/REC-xml-20040204/
Shasha, D., Zhang, K.: Approximate Tree Pattern Matching. In: Apostolico, A., Galil, Z. (eds.) Pattern Matching Algorithms, ch. 14. Oxford University Press, Oxford (June 1997)
Google Scholar
Shasha, D., Zhang, K.: Fast algorithms for the unit cost editing distance between trees. Journal of Algorithms 11, 581–621 (1990)
Article MATH MathSciNet Google Scholar
Tanaka, E., Tanaka, K.: The Tree-to-tree Editing Problem. International Journal of Pattern Recognition and Artificial Intelligence 2(2), 221–240 (1988)
Article Google Scholar
Courcelle, B.: On recognizable sets and tree automata. In: Nivat, M., Ait-Kaci, H. (eds.) Resolution of Equations in Algebraic Structures. Academic Press, London (1989)
Google Scholar
Murata, M.: Hedge Automata: A Formal Model for XML Schemata, http://www.xml.gr.jp/relax/hedge_nice.html
Myers, G.: Approximately Matching Context Free Languages. Information Processing Letters 54(2), 85–92 (1995)
Article MATH MathSciNet Google Scholar
Bertino, E., Guerrini, G., Mesiti, M.: A Matching Algorithm for Measuring the Structural Similarity Between an XML document and a DTD and its Applications. Information Systems 29, 23–46 (2004)
Article MathSciNet Google Scholar
Boukottaya, A., Vanoirbeek, C., Paganelli, F., Abou Khaled, O.: Automating XML Documents Transformations: A Conceptual Modelling Based Approach. In: Proceedings of 1st Asian-Pacific conference on Conceptual modelling, Dunedin, New Zealand, vol. 31, pp. 81–90 (2004)
Google Scholar
de Castro Reis, D., Golgher, P.B., da Silva, A.S., Laender, A.H.F.: Automatic web news extraction using tree edit distance. In: WWW 2004, Manhattan, NY, pp. 502–511 (2004)
Google Scholar
Selkow, S.M.: The Tree-to-Tree Editing Problem. Information Processing Letters 6, 184–186 (1977)
Article MATH MathSciNet Google Scholar
Chen, W.: New Algorithm for Ordered Tree-to-Tree Correction Problem. Journal of Algorithms 40, 135–158 (2001)
Article MATH Google Scholar
Garofalakis, M., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: Xtract: A System For Extracting Document Type Descriptors From XML Documents. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 165–176 (2000)
Google Scholar
Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: Proceedings of WebDB 2002, Madison, Wisconsin (June 2002)
Google Scholar
Schlieder, T.: Similarity Search in XML Data using Cost-Based Query Transformations. In: Proceedings of WebDB 2001, pp. 19–24 (2001)
Google Scholar
Schmidt, A.R., Waas, F., Kersten, M.L., Florescu, D., Manolescu, I., Carey, M.J., Busse, R.: The XML Benchmark Project. Technical Report INS-R0103, CWI, Amsterdam, The Netherlands (April 2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Western Kentucky University, Bowling Green, KY, 42104, USA
Guangming Xing

Authors

Guangming Xing
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of ITEE, The University of Queensland, Australia
Xiaofang Zhou
School of Computer Science and Technology, Heilongjiang University, China
Jianzhong Li
School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
Heng Tao Shen
Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, 153-8505, Tokyo, Japan
Masaru Kitsuregawa
Victoria University, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xing, G. (2006). Fast Approximate Matching Between XML Documents and Schemata. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds) Frontiers of WWW Research and Development - APWeb 2006. APWeb 2006. Lecture Notes in Computer Science, vol 3841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610113_38

Download citation

DOI: https://doi.org/10.1007/11610113_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31142-3
Online ISBN: 978-3-540-32437-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics