Abstract
With the exponential increase of information on the web, the emphasis has shifted from mere viewing of information to efficient retrieval and notification of selective information. Currently, users have to poll the pages manually to check for changes of interest, resulting in waste of resources and associated high cost. Hence, an efficient and effective change detection and notification mechanism is needed. WebVigiL, a general-purpose, active capability-based information monitoring and notification system, handles specification, management, and propagation of customized changes as requested by a user. The emphasis of change detection in WebVigiL is to detect customized changes on the document, based on user intent. In this paper, we propose two different algorithms to handle change detection to contents of semi-structured and unstructured documents. Though the approach taken is general, we will explain the change detection in the context of HTML (unstructured) and XML (semistructured) documents. We also provide a simple change presentation scheme to display the changes computed. We highlight the change detection in the context of WebVigiL and briefly describe the rest of the system.
This work was supported, in part, by the Office of Naval Research & the SPAWAR System Center-San Diego & by the Rome Laboratory grant F30602-01-2-05430, and by NSF grants IIS-0123730 and ITR 0121297.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chakravarthy, S., et al. WebVigiL: An approach to Just-In-Time Information Propagation In Large Network-Centric Environments. in Second International Workshop on Web Dynamics. 2002. Hawaii.
Jacob, J., et al., WebVigiL: An approach to Just-In-Time Information Propagation In Large Network-Centric Environments(to be published), in Web Dynamics Book. 2003, Springer-Verlag.
Chakravarthy, S., et al., WebVigiL: Architecture and Functionality of a Web Monitoring System (submitted). http://itlab.uta.edu/sharma/Projects/WebVigil/files/WVFetch.pdf.
J.W. Hunt and M.D. Mcllroy, An algorithm for efficient file comparison. 1975, Bell Laboratories: Murray Hill, N.J.
E. Myers, An O(ND) difference algorithm and its variations. Algorithmica, 1986. 1: p. 251–266.
S. Wu, U. Manber, and E. Myers, An O(NP) sequence comparision algorithm. Information Processing Letters, 1990. 35: p. 317–323.
Hirschberg, D., Algorithms for the longest common subsequence problem. Journal of the ACM, 1977: p. 664–675.
Douglis, F., et al., The AT&T Internet Difference Engine: Tracking and Viewing Changes on the Web, in World Wide Web. 1998, Baltzer Science Publishers. p. 27–44.
Saeyor, S. and M. Ishizuka. WebBeholder: A Revolution in Tracking and Viewing Changes on The Web by Agent Community. in WebNet98. 1998.
Baker, S.B. A theory of parametrized pattern matching:algorithms and applications. in Proceedings of the 25th Annual ACM Symposium on Theory of Computing. 1993.
Balazinska, M., et al. Advanced clone-analysis to support object-oriented system refactoring. in Seventh Working Conference on Reverse Engineering. 2000.
Lucca, G.D., et al. Clone Analysis in the Web Era: an Approach to Identify Cloned Web Pages. in Seventh IEEE Workshop on Empirical Studies of Software Maintenance. 2001. Florence, Italy.
Ulam, S.M. Some Combinatorial Problems Studied Experimentally on Computing Machines. in Zaremba S.K., Applications of Number Theory to Numerical Analysis. 1972: Academic Press.
K. Zhang and D. Shasha, Simple Fast Algorithms for the Editing Distance between Trees and Related Problems. SIAM Journal of Computing, 1989. 18(6): p. 1245–1262.
K. Zhang, R. Statman, and D. Shasha, On the Editing Distance between Unordered Labeled Trees. Information Processing Letters, 1992. 42: p. 133–139.
S. Chawathe, et al. Change detection in hierarchically structured information. in Proceedings of the ACM SIGMOD International Conference on Management of Data. 1996. Montréal, Québec.
Y. Wang, D.De Witt, and J. Cai, X-Diff: An Effective Change Detection Algorithm for XML Documents. 2001, Technical Report, University of Wisconsin.
G. Cobena, S. Abiteboul, and A. Marian, Detecting Changes in XML Documents. Data Engineering, 2002.
F.P. Curbera and D.A. Epstein, Fast Difference and Update of XML Documents. XTech’99, 1999.
Chen, Y.-F. and E. Koutsofios. WebCiao: A Website Visualization and Tracking System. in WebNet97. 1997.
Extensible Markup Language(XML)., World Wide Web Consor tium, http://www.w3.org/XML/.
S. Abiteboul, P. Buneman, and D. Suciu, Data on the Web: From Relations to Semistructured Data and XML. 1999: Morgan Kaufmann.
HTML-Parser, http://www.quiotix.com/downloads/html-parser/.
Liu, L., C. Pu, and W. Tang. WebCQ: Detecting and Delivering Information Changes on the Web. in Proceedings of International Conference on Information and Knowledge Management (CIKM). 2000. Washington D.C: ACM Press.
Java1.3, http://java.sun.com/j2se/1.3/docs/api/.
Document Object Model(DOM)., http://www.w3.org/DOM/.
Xerces-J, http://xml.apache.org/xerces2-j/index.html.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pandrangi, N., Jacob, J., Sanka, A., Chakravarthy, S. (2003). WebVigiL: User Profile-Based Change Detection for HTML/XML Documents. In: James, A., Younas, M., Lings, B. (eds) New Horizons in Information Management. BNCOD 2003. Lecture Notes in Computer Science, vol 2712. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45073-4_5
Download citation
DOI: https://doi.org/10.1007/3-540-45073-4_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40536-8
Online ISBN: 978-3-540-45073-3
eBook Packages: Springer Book Archive