Abstract
Schema-less change detection is the processes of comparing successive versions of an XML document or data collection to determine which portions are the same and which have changed, without using a schema. Change detection can be used to reduce space in an historical data collection and to support temporal queries. Most previous research has focused on detecting structural changes between document versions. But techniques that depend on structure break down when the structural change is significant. This paper develops an algorithm for detecting change based on the semantics, rather than on the structure, of a document. The algorithm is based on the observation that information that identifies an element is often conserved across changes to a document. The algorithm first isolates identifiers for elements. It then uses these identifiers to associate elements in successive versions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apostolico, A., Galil, Z. (eds.): Pattern Matching Algorithms. Oxford University Press, Oxford (1997)
Brewington, B., Cybenko, G.: How Dynamic is the Web? In: Proc. of the 9th International World Wide Web Conference, Amsterdam, Netherlands, pp. 257–276 (May 2000)
Buneman, P., Davidson, S., Fan, W., Hara, C., Tan, W.: Keys for XML. In: Proc. of the 10th International World Wide Web Conference, Hong Kong, China, pp. 201–210 (2001)
Cobéna, G., Abiteboul, S., Marian, A.: Detecting Changes in XML Documents. In: Proceedings of ICDE, San Jose, pp. 41–52 (February 2002)
Chawathe, S., Garcia-Molina, H.: Meaningful Change Detection in Structured Data. In: Proceedings of SIGMOD Conference, pp. 26–37 (June 1997)
Cho, J., Garcia-Molina, H.: The Evolution of the Web and Implications for an Incremental Crawler. In: Proc. of VLDB Conference, Cairo, Egypt, pp. 200–209 (September 2000)
Chawathe, S., Rajaraman, A., Garcia-Molina, H., Widom, J.: Change Detection in Hierarchically Structured Information. In: SIGMOD Conference, Montreal, Canada, pp. 493–504 (June 1996)
Douglis, F., Ball, T., Chen, Y.F., Koutsofios, E.: The AT&T Internet Difference Engine: Tracking and Viewing Changes on the Web. World Wide Web 1(1), 27–44 (1998)
Dyreson, C., Ling, H., Wang, Y.: Managing Versions of Web Documents in a Transaction time Web Server. In: Proc. of the 13th International World Wide Web Conference, New York City, pp. 421–432 (May 2004)
Dyreson, C.: Observing Transaction-time Semantics with TTXPath. In: Proceedings of WISE, Kyoto, Japan, pp. 193–202 (December 2001)
Grandi, F.: Introducing an Annotated Bibliography on Temporal and Evolution Aspects in the World Wide Web. SIGMOD Record 33(2) (June 2004)
Gao, D., Snodgrass, R.T.: Temporal Slicing in the Evaluation of XML Queries. In: Proceedings of VLDB, pp. 632–643 (2003)
Hoffmann, C.M., O’Donnell, M.: Pattern Matching in Trees. JACM 29, 68–95 (1982)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory 10, 707–710 (1966)
Liu, L., Pu, C., Barga, R., Zhou, T.: Differential Evaluation of Continual Queries. In: Proc. of the International Conference on Distributed Computing Systems, pp. 458–465 (1996)
Liu, L., Pu, C., Tang, W.: Continual Queries for Internet Scale Event-Driven Information Delivery. IEEE Trans. Knowledge Data Engineering 11(4), 610–628 (1999)
Lu, S.: A tree-to-tree distance and its application to cluster analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 1(2), 219–224 (1979)
Masek, W., Paterson, M.: A faster algorithm for computing string edit distances. J. Comput. System Sci, 18–31 (1980)
Myers, E.: An O(ND) Difference Algorithm and Its Variations. Algorithmica 1(2), 251–266 (1986)
Tai, K.C.: The Tree-to-Tree Correction Problem. JACM 26, 485–495 (1979)
XML Path Language (XPath) 2.0. W3C, http://www.w3c.org/TR/xpath20/ (current as of August 2004)
Wang, Y., DeWitt, D., Cai, J.-Y.: X-Diff: An Effective Change Detection Algorithm for XML Documents, http://www.cs.wisc.edu/niagara/papers/xdiff.pdf (Current as of August 2004)
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. JACM 21, 168–173 (1974)
Zhang, K., Shasha, D.: Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM Journal of Computing 18(6), 1245–1262 (1989)
Zhang, K., Statman, R., Shasha, D.: On the Editing Distance between Unordered Labeled Trees. Information Processing Letters 42, 133–139 (1992)
Zhang, K.: A Constrained Edit Distance between Unordered Labeled Trees. Algorithmica, 205–222 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, S., Dyreson, C., Snodgrass, R.T. (2004). Schema-Less, Semantics-Based Change Detection for XML Documents. In: Zhou, X., Su, S., Papazoglou, M.P., Orlowska, M.E., Jeffery, K. (eds) Web Information Systems – WISE 2004. WISE 2004. Lecture Notes in Computer Science, vol 3306. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30480-7_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-30480-7_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23894-2
Online ISBN: 978-3-540-30480-7
eBook Packages: Springer Book Archive