Skip to main content

WebVigiL: User Profile-Based Change Detection for HTML/XML Documents

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2712))

Abstract

With the exponential increase of information on the web, the emphasis has shifted from mere viewing of information to efficient retrieval and notification of selective information. Currently, users have to poll the pages manually to check for changes of interest, resulting in waste of resources and associated high cost. Hence, an efficient and effective change detection and notification mechanism is needed. WebVigiL, a general-purpose, active capability-based information monitoring and notification system, handles specification, management, and propagation of customized changes as requested by a user. The emphasis of change detection in WebVigiL is to detect customized changes on the document, based on user intent. In this paper, we propose two different algorithms to handle change detection to contents of semi-structured and unstructured documents. Though the approach taken is general, we will explain the change detection in the context of HTML (unstructured) and XML (semistructured) documents. We also provide a simple change presentation scheme to display the changes computed. We highlight the change detection in the context of WebVigiL and briefly describe the rest of the system.

This work was supported, in part, by the Office of Naval Research & the SPAWAR System Center-San Diego & by the Rome Laboratory grant F30602-01-2-05430, and by NSF grants IIS-0123730 and ITR 0121297.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chakravarthy, S., et al. WebVigiL: An approach to Just-In-Time Information Propagation In Large Network-Centric Environments. in Second International Workshop on Web Dynamics. 2002. Hawaii.

    Google Scholar 

  2. Jacob, J., et al., WebVigiL: An approach to Just-In-Time Information Propagation In Large Network-Centric Environments(to be published), in Web Dynamics Book. 2003, Springer-Verlag.

    Google Scholar 

  3. Chakravarthy, S., et al., WebVigiL: Architecture and Functionality of a Web Monitoring System (submitted). http://itlab.uta.edu/sharma/Projects/WebVigil/files/WVFetch.pdf.

    Google Scholar 

  4. J.W. Hunt and M.D. Mcllroy, An algorithm for efficient file comparison. 1975, Bell Laboratories: Murray Hill, N.J.

    Google Scholar 

  5. E. Myers, An O(ND) difference algorithm and its variations. Algorithmica, 1986. 1: p. 251–266.

    Article  MATH  MathSciNet  Google Scholar 

  6. S. Wu, U. Manber, and E. Myers, An O(NP) sequence comparision algorithm. Information Processing Letters, 1990. 35: p. 317–323.

    Article  MATH  MathSciNet  Google Scholar 

  7. Hirschberg, D., Algorithms for the longest common subsequence problem. Journal of the ACM, 1977: p. 664–675.

    Google Scholar 

  8. Douglis, F., et al., The AT&T Internet Difference Engine: Tracking and Viewing Changes on the Web, in World Wide Web. 1998, Baltzer Science Publishers. p. 27–44.

    Google Scholar 

  9. Saeyor, S. and M. Ishizuka. WebBeholder: A Revolution in Tracking and Viewing Changes on The Web by Agent Community. in WebNet98. 1998.

    Google Scholar 

  10. Baker, S.B. A theory of parametrized pattern matching:algorithms and applications. in Proceedings of the 25th Annual ACM Symposium on Theory of Computing. 1993.

    Google Scholar 

  11. Balazinska, M., et al. Advanced clone-analysis to support object-oriented system refactoring. in Seventh Working Conference on Reverse Engineering. 2000.

    Google Scholar 

  12. Lucca, G.D., et al. Clone Analysis in the Web Era: an Approach to Identify Cloned Web Pages. in Seventh IEEE Workshop on Empirical Studies of Software Maintenance. 2001. Florence, Italy.

    Google Scholar 

  13. Ulam, S.M. Some Combinatorial Problems Studied Experimentally on Computing Machines. in Zaremba S.K., Applications of Number Theory to Numerical Analysis. 1972: Academic Press.

    Google Scholar 

  14. K. Zhang and D. Shasha, Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM Journal of Computing, 1989. 18(6): p. 1245–1262.

    Article  MATH  MathSciNet  Google Scholar 

  15. K. Zhang, R. Statman, and D. Shasha, On the Editing Distance between Unordered Labeled Trees. Information Processing Letters, 1992. 42: p. 133–139.

    Article  MATH  MathSciNet  Google Scholar 

  16. S. Chawathe, et al. Change detection in hierarchically structured information. in Proceedings of the ACM SIGMOD International Conference on Management of Data. 1996. Montréal, Québec.

    Google Scholar 

  17. Y. Wang, D.De Witt, and J. Cai, X-Diff: An Effective Change Detection Algorithm for XML Documents. 2001, Technical Report, University of Wisconsin.

    Google Scholar 

  18. G. Cobena, S. Abiteboul, and A. Marian, Detecting Changes in XML Documents. Data Engineering, 2002.

    Google Scholar 

  19. F.P. Curbera and D.A. Epstein, Fast Difference and Update of XML Documents. XTech’99, 1999.

    Google Scholar 

  20. Chen, Y.-F. and E. Koutsofios. WebCiao: A Website Visualization and Tracking System. in WebNet97. 1997.

    Google Scholar 

  21. Extensible Markup Language(XML)., World Wide Web Consor tium, http://www.w3.org/XML/.

    Google Scholar 

  22. S. Abiteboul, P. Buneman, and D. Suciu, Data on the Web: From Relations to Semistructured Data and XML. 1999: Morgan Kaufmann.

    Google Scholar 

  23. HTML-Parser, http://www.quiotix.com/downloads/html-parser/.

    Google Scholar 

  24. Liu, L., C. Pu, and W. Tang. WebCQ: Detecting and Delivering Information Changes on the Web. in Proceedings of International Conference on Information and Knowledge Management (CIKM). 2000. Washington D.C: ACM Press.

    Google Scholar 

  25. Java1.3, http://java.sun.com/j2se/1.3/docs/api/.

    Google Scholar 

  26. Document Object Model(DOM)., http://www.w3.org/DOM/.

    Google Scholar 

  27. Xerces-J, http://xml.apache.org/xerces2-j/index.html.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pandrangi, N., Jacob, J., Sanka, A., Chakravarthy, S. (2003). WebVigiL: User Profile-Based Change Detection for HTML/XML Documents. In: James, A., Younas, M., Lings, B. (eds) New Horizons in Information Management. BNCOD 2003. Lecture Notes in Computer Science, vol 2712. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45073-4_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-45073-4_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40536-8

  • Online ISBN: 978-3-540-45073-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics