Abstract
The World Wide Web has a very dynamic character with resources constantly disappearing and (re-)surfacing. A ubiquitous result is the “404 Page not Found” error as the request for missing web pages. We investigate tags obtained from Delicious for the purpose of rediscovering such missing web pages with the help of search engines. We determine the best performing tag based query length, quantify the relevance of the results and compare tags to retrieval methods based on a page’s content. We find that tags are only useful in addition to content based methods. We further introduce the notion of “ghost tags”, terms used as tags that do not occur in the current but did occur in a previous version of the web page. One third of these ghost tags are ranked high in Delicious and also occurred frequently in the document which indicates their importance to both the user and the content of the document.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agichtein, E., Zheng, Z.: Identifying ”Best Bet” Web Search Results by Mining Past User Behavior. In: Proceedings of KDD 2006, pp. 902–908 (2006)
Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., Su, Z.: Optimizing Web Search Using Social Annotations. In: Proceedings of WWW 2007, pp. 501–510 (2007)
Berners-Lee, T.: Cool URIs don’t change (1998), http://www.w3.org/Provider/Style/URI.html
Bischoff, K., Firan, C., Nejdl, W., Paiu, R.: Can All Tags Be Used for Search? In: Proceedings of CIKM 2008, pp. 193–202 (2008)
Heymann, P., Koutrika, G., Garcia-Molina, H.: Can Social Bookmarking Improve Web Search? In: Proceedings of WSDM 2008, pp. 195–206 (2008)
Jason Morrison, P.: Tagging and Searching: Search Retrieval Effectiveness of Folksonomies on the World Wide Web. Information Processing and Management 44, 1562–1579 (2008)
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately Interpreting Clickthrough Data as Implicit Feedback. In: Proceedings of SIGIR 2005, pp. 154–161 (2005)
Klein, M., Nelson, M.L.: Revisiting lexical signatures to (Re-)Discover web pages. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 371–382. Springer, Heidelberg (2008)
Klein, M., Nelson, M.L.: Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure. In: Proceedings of JCDL 2010, pp. 59–68 (2010)
Klein, M., Shipman, J., Nelson, M.L.: Is This a Good title? In: Proceedings of Hypertext 2010, pp. 3–12 (2010)
Klein, M., Ware, J., Nelson, M.L.: Rediscovering Missing Web Pages Using Link Neighborhood Lexical Signatures. In: Proceedings of JCDL 2011 (2011)
Krause, B., Hotho, A., Stumme, G.: A Comparison of Social Bookmarking with Traditional Search. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 101–113. Springer, Heidelberg (2008)
Marshall, C.C., McCown, F., Nelson, M.L.: Evaluating personal archiving strategies for internet-based information (2007)
Sugiyama, K., Hatano, K., Yoshikawa, M., Uemura, S.: Refinement of TF-IDF Schemes for Web Pages using their Hyperlinked Neighboring Pages. In: Proceedings of HYPERTEXT 2003, pp. 198–207 (2003)
Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L., Ainsworth, S., Shankar, H.: Memento: Time Travel for the Web. Technical Report arXiv:0911.1112 (2009)
Yanbe, Y., Jatowt, A., Nakamura, S., Tanaka, K.: Can Social Bookmarking Enhance Search in the Web? In: Proceedings of JCDL 2007, pp. 107–116 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Klein, M., Nelson, M.L. (2011). Find, New, Copy, Web, Page - Tagging for the (Re-)Discovery of Web Pages. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2011. Lecture Notes in Computer Science, vol 6966. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24469-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-24469-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24468-1
Online ISBN: 978-3-642-24469-8
eBook Packages: Computer ScienceComputer Science (R0)