Skip to main content

Schema-Less, Semantics-Based Change Detection for XML Documents

  • Conference paper
Web Information Systems – WISE 2004 (WISE 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3306))

Included in the following conference series:

Abstract

Schema-less change detection is the processes of comparing successive versions of an XML document or data collection to determine which portions are the same and which have changed, without using a schema. Change detection can be used to reduce space in an historical data collection and to support temporal queries. Most previous research has focused on detecting structural changes between document versions. But techniques that depend on structure break down when the structural change is significant. This paper develops an algorithm for detecting change based on the semantics, rather than on the structure, of a document. The algorithm is based on the observation that information that identifies an element is often conserved across changes to a document. The algorithm first isolates identifiers for elements. It then uses these identifiers to associate elements in successive versions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apostolico, A., Galil, Z. (eds.): Pattern Matching Algorithms. Oxford University Press, Oxford (1997)

    MATH  Google Scholar 

  2. Brewington, B., Cybenko, G.: How Dynamic is the Web? In: Proc. of the 9th International World Wide Web Conference, Amsterdam, Netherlands, pp. 257–276 (May 2000)

    Google Scholar 

  3. Buneman, P., Davidson, S., Fan, W., Hara, C., Tan, W.: Keys for XML. In: Proc. of the 10th International World Wide Web Conference, Hong Kong, China, pp. 201–210 (2001)

    Google Scholar 

  4. Cobéna, G., Abiteboul, S., Marian, A.: Detecting Changes in XML Documents. In: Proceedings of ICDE, San Jose, pp. 41–52 (February 2002)

    Google Scholar 

  5. Chawathe, S., Garcia-Molina, H.: Meaningful Change Detection in Structured Data. In: Proceedings of SIGMOD Conference, pp. 26–37 (June 1997)

    Google Scholar 

  6. Cho, J., Garcia-Molina, H.: The Evolution of the Web and Implications for an Incremental Crawler. In: Proc. of VLDB Conference, Cairo, Egypt, pp. 200–209 (September 2000)

    Google Scholar 

  7. Chawathe, S., Rajaraman, A., Garcia-Molina, H., Widom, J.: Change Detection in Hierarchically Structured Information. In: SIGMOD Conference, Montreal, Canada, pp. 493–504 (June 1996)

    Google Scholar 

  8. Douglis, F., Ball, T., Chen, Y.F., Koutsofios, E.: The AT&T Internet Difference Engine: Tracking and Viewing Changes on the Web. World Wide Web 1(1), 27–44 (1998)

    Article  Google Scholar 

  9. Dyreson, C., Ling, H., Wang, Y.: Managing Versions of Web Documents in a Transaction time Web Server. In: Proc. of the 13th International World Wide Web Conference, New York City, pp. 421–432 (May 2004)

    Google Scholar 

  10. Dyreson, C.: Observing Transaction-time Semantics with TTXPath. In: Proceedings of WISE, Kyoto, Japan, pp. 193–202 (December 2001)

    Google Scholar 

  11. Grandi, F.: Introducing an Annotated Bibliography on Temporal and Evolution Aspects in the World Wide Web. SIGMOD Record 33(2) (June 2004)

    Google Scholar 

  12. Gao, D., Snodgrass, R.T.: Temporal Slicing in the Evaluation of XML Queries. In: Proceedings of VLDB, pp. 632–643 (2003)

    Google Scholar 

  13. Hoffmann, C.M., O’Donnell, M.: Pattern Matching in Trees. JACM 29, 68–95 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  14. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory 10, 707–710 (1966)

    MathSciNet  Google Scholar 

  15. Liu, L., Pu, C., Barga, R., Zhou, T.: Differential Evaluation of Continual Queries. In: Proc. of the International Conference on Distributed Computing Systems, pp. 458–465 (1996)

    Google Scholar 

  16. Liu, L., Pu, C., Tang, W.: Continual Queries for Internet Scale Event-Driven Information Delivery. IEEE Trans. Knowledge Data Engineering 11(4), 610–628 (1999)

    Article  Google Scholar 

  17. Lu, S.: A tree-to-tree distance and its application to cluster analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 1(2), 219–224 (1979)

    MATH  Google Scholar 

  18. Masek, W., Paterson, M.: A faster algorithm for computing string edit distances. J. Comput. System Sci, 18–31 (1980)

    Google Scholar 

  19. Myers, E.: An O(ND) Difference Algorithm and Its Variations. Algorithmica 1(2), 251–266 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  20. Tai, K.C.: The Tree-to-Tree Correction Problem. JACM 26, 485–495 (1979)

    Article  MathSciNet  Google Scholar 

  21. XML Path Language (XPath) 2.0. W3C, http://www.w3c.org/TR/xpath20/ (current as of August 2004)

  22. Wang, Y., DeWitt, D., Cai, J.-Y.: X-Diff: An Effective Change Detection Algorithm for XML Documents, http://www.cs.wisc.edu/niagara/papers/xdiff.pdf (Current as of August 2004)

  23. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. JACM 21, 168–173 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  24. Zhang, K., Shasha, D.: Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM Journal of Computing 18(6), 1245–1262 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  25. Zhang, K., Statman, R., Shasha, D.: On the Editing Distance between Unordered Labeled Trees. Information Processing Letters 42, 133–139 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  26. Zhang, K.: A Constrained Edit Distance between Unordered Labeled Trees. Algorithmica, 205–222 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, S., Dyreson, C., Snodgrass, R.T. (2004). Schema-Less, Semantics-Based Change Detection for XML Documents. In: Zhou, X., Su, S., Papazoglou, M.P., Orlowska, M.E., Jeffery, K. (eds) Web Information Systems – WISE 2004. WISE 2004. Lecture Notes in Computer Science, vol 3306. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30480-7_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30480-7_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23894-2

  • Online ISBN: 978-3-540-30480-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics