Abstract
Digital annotation of web pages presents two types of problems which are unknown to traditional annotation and which are connected to the dynamicity and the openness of the Web. The first problem is related to the possibility of replicating a document over multiple sites, so that it can be retrieved over the Web at different URLs or with different queries. This poses the need to associate to a web page all the annotations pertaining to its content, even if they were created while accessing the same content under a different URL. The second problem is related to the dynamics of individual HTML pages that often consist of insertions, deletions or movement of page segments. Annotations related to portions of the page that have moved within the page itself should be retrieved and shown to the user. To reduce the impact of these phenomena on the usefulness of the annotation process, our annotation system madcow incorporates two algorithms which assess the identity of two pages under two different URLs, and the differences between two versions of a page under the same URL, taking the proper actions in order to retrieve all the pertaining annotations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bottoni, P., Civica, R., Levialdi, S., Orso, L., Panizzi, E., Trinchese, R.: MADCOW: a Multimedia Digital Annotation System. In: AVI 2004, pp. 55–62. ACM Press, New York (2004)
Bottoni, P., Levialdi, S., Panizzi, E., Pambuffetti, N., Trinchese, R.: Storing and retrieving multimedia web notes. IJCSE (to appear)
Bottoni, P., Levialdi, S., Rizzo, P.: An analysis and case study of digital annotation. In: Bianchi-Berthouze, N. (ed.) DNIS 2003. LNCS, vol. 2822, pp. 216–230. Springer, Heidelberg (2003)
Bottoni, P., Civica, R., Levialdi, S., Orso, L., Panizzi, E., Trinchese, R.: Storing and retrieving multimedia web notes. In: Bhalla, S. (ed.) DNIS 2005. LNCS, vol. 3433, pp. 119–137. Springer, Heidelberg (2005)
Brin, S., Davis, J., García-Molina, H.: Copy detection mechanisms for digital documents. In: SIGMOD 1995, pp. 398–409. ACM Press, New York (1995)
Broder, A.: On the resemblance and containment of documents. In: SEQUENCES 1997, vol. 00, page. 21. IEEE Computer Society Press, Los Alamitos, CA, USA (1997)
Chowdhury, A., Frieder, O., Grossman, D., McCabe, M.C.: Collection statistics for fast duplicate document detection. ACM Trans. Inf. Syst. 20(2), 171–191 (2002)
Manber, U.: Finding similar files in a large filesystem. In: 1994 Winter USENIX Technical Conference, pp. 1–10 (1994)
Pugh, W., Henzinger, M.H.: Detecting duplicate and near-duplicate files. US Patent 6658423 (December 2003)
Rabin, M.O.: Fingerprinting by random polynomials. Report TR-15-81, Center for research in computing technology, Harvard University (1981)
Sanderson, M.: Duplicate detection in the Reuters collection. Technical Report TR-1997-5, Department of Computer Science, University of Glasgow (1997)
Shivakumar, N., Garcia-Molina, H.: Scam: a copy detection mechanism for digital documents. In: Proc. International Conference on Theory and Practice of Digital Libraries (1995)
Shivakumar, N., Garcia-Molina, H.: Building a scalable and accurate copy detection mechanism. In: DL 1996, pp. 160–168. ACM Press, New York (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bottoni, P., Cuomo, M., Levialdi, S., Panizzi, E., Passavanti, M., Trinchese, R. (2007). Differences and Identities in Document Retrieval in an Annotation Environment. In: Bhalla, S. (eds) Databases in Networked Information Systems. DNIS 2007. Lecture Notes in Computer Science, vol 4777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75512-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-75512-8_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75511-1
Online ISBN: 978-3-540-75512-8
eBook Packages: Computer ScienceComputer Science (R0)