Skip to main content

A Natural and Multi-layered Approach to Detect Changes in Tree-Based Textual Documents

  • Conference paper
Enterprise Information Systems (ICEIS 2009)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 24))

Included in the following conference series:

Abstract

Several efficient and very powerful algorithms exist for detecting changes in tree-based textual documents, such as those encoded in XML. An important aspect is still underestimated in their design and implementation: the quality of the output, in terms of readability, clearness and accuracy for human users. Such requirement is particularly relevant when diff-ing literary documents, such as books, articles, reviews, acts, and so on. This paper introduces the concept of ’naturalness’ in diff-ing tree-based textual documents, and discusses a new extensible set of changes which can and should be detected. A naturalness-based algorithm is presented, as well as its application for diff-ing XML-encoded legislative documents. The algorithm, called JNDiff, proved to detect significantly better matchings (since new operations are recognized) and to be very efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agnoloni, T., Francesconi, E., Spinosa, P.: xmLeges Editor, an OpenSource visual XML editor for supporting Legal National Standards. In: Proceedings of V Legislative XML Workshop, Florence, Italy (2007)

    Google Scholar 

  2. Eggert, P.: Free Software Foundation: GNU Diff (2006), http://www.gnu.org/software/diffutils/diffutils.html

  3. Ball, T., Douglis, F.: Tracking and viewing changes on the web. In: 1996 USENIX Annual Technical Conference (1996)

    Google Scholar 

  4. Chen, Y.F., Douglis, F., Ball, T., Koutsofios, E.: The at&t internet difference engine: Tracking and viewing changes on the web. World Wide Web 1(1), 27–44 (1998)

    Article  Google Scholar 

  5. Fontaine, R.L.: A delta format for xml: identifying changes in xml files and representing the changes in xml. In: XML Europe 2001 (May 2001)

    Google Scholar 

  6. Fontaine, R.L.: Xml files: a new approach providing intelligent merge of xml data sets. In: XML Europe 2002 (May 2002)

    Google Scholar 

  7. Marian, A., Cobena, G., Abiteboul, S.: Detecting changes in xml documents. In: The 18th International Conference on Data Engineering, February 2002, pp. 493–504 (2002)

    Google Scholar 

  8. Hirschberg, D.S.: Algorithm for the longest common subsequence problem. Journal of the ACM 24(4), 664–675 (1977)

    Article  Google Scholar 

  9. Lupo, C., Aini, F.: Norme in rete (1999), http://www.normeinrete.it/

  10. Myers, E.W.: An o(nd) difference algorithm and its variations. Algorithmica 1(2), 251–266 (1986)

    Article  Google Scholar 

  11. Cai, J., Wang, Y., DeWitt, D.: X-diff: an effective change detection algorithm for xml documents. Technical Report, University of Wisconsin (2001)

    Google Scholar 

  12. Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing 18(6), 1245–1262 (1989)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Di Iorio, A., Schirinzi, M., Vitali, F., Marchetti, C. (2009). A Natural and Multi-layered Approach to Detect Changes in Tree-Based Textual Documents. In: Filipe, J., Cordeiro, J. (eds) Enterprise Information Systems. ICEIS 2009. Lecture Notes in Business Information Processing, vol 24. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01347-8_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01347-8_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01346-1

  • Online ISBN: 978-3-642-01347-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics