Skip to main content

Find, New, Copy, Web, Page - Tagging for the (Re-)Discovery of Web Pages

  • Conference paper
Research and Advanced Technology for Digital Libraries (TPDL 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6966))

Included in the following conference series:

Abstract

The World Wide Web has a very dynamic character with resources constantly disappearing and (re-)surfacing. A ubiquitous result is the “404 Page not Found” error as the request for missing web pages. We investigate tags obtained from Delicious for the purpose of rediscovering such missing web pages with the help of search engines. We determine the best performing tag based query length, quantify the relevance of the results and compare tags to retrieval methods based on a page’s content. We find that tags are only useful in addition to content based methods. We further introduce the notion of “ghost tags”, terms used as tags that do not occur in the current but did occur in a previous version of the web page. One third of these ghost tags are ranked high in Delicious and also occurred frequently in the document which indicates their importance to both the user and the content of the document.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agichtein, E., Zheng, Z.: Identifying ”Best Bet” Web Search Results by Mining Past User Behavior. In: Proceedings of KDD 2006, pp. 902–908 (2006)

    Google Scholar 

  2. Bao, S., Xue, G., Wu, X., Yu, Y., Fei, B., Su, Z.: Optimizing Web Search Using Social Annotations. In: Proceedings of WWW 2007, pp. 501–510 (2007)

    Google Scholar 

  3. Berners-Lee, T.: Cool URIs don’t change (1998), http://www.w3.org/Provider/Style/URI.html

  4. Bischoff, K., Firan, C., Nejdl, W., Paiu, R.: Can All Tags Be Used for Search? In: Proceedings of CIKM 2008, pp. 193–202 (2008)

    Google Scholar 

  5. Heymann, P., Koutrika, G., Garcia-Molina, H.: Can Social Bookmarking Improve Web Search? In: Proceedings of WSDM 2008, pp. 195–206 (2008)

    Google Scholar 

  6. Jason Morrison, P.: Tagging and Searching: Search Retrieval Effectiveness of Folksonomies on the World Wide Web. Information Processing and Management 44, 1562–1579 (2008)

    Article  Google Scholar 

  7. Joachims, T., Granka, L., Pan, B., Hembrooke, H., Gay, G.: Accurately Interpreting Clickthrough Data as Implicit Feedback. In: Proceedings of SIGIR 2005, pp. 154–161 (2005)

    Google Scholar 

  8. Klein, M., Nelson, M.L.: Revisiting lexical signatures to (Re-)Discover web pages. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 371–382. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Klein, M., Nelson, M.L.: Evaluating Methods to Rediscover Missing Web Pages from the Web Infrastructure. In: Proceedings of JCDL 2010, pp. 59–68 (2010)

    Google Scholar 

  10. Klein, M., Shipman, J., Nelson, M.L.: Is This a Good title? In: Proceedings of Hypertext 2010, pp. 3–12 (2010)

    Google Scholar 

  11. Klein, M., Ware, J., Nelson, M.L.: Rediscovering Missing Web Pages Using Link Neighborhood Lexical Signatures. In: Proceedings of JCDL 2011 (2011)

    Google Scholar 

  12. Krause, B., Hotho, A., Stumme, G.: A Comparison of Social Bookmarking with Traditional Search. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 101–113. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  13. Marshall, C.C., McCown, F., Nelson, M.L.: Evaluating personal archiving strategies for internet-based information (2007)

    Google Scholar 

  14. Sugiyama, K., Hatano, K., Yoshikawa, M., Uemura, S.: Refinement of TF-IDF Schemes for Web Pages using their Hyperlinked Neighboring Pages. In: Proceedings of HYPERTEXT 2003, pp. 198–207 (2003)

    Google Scholar 

  15. Van de Sompel, H., Nelson, M.L., Sanderson, R., Balakireva, L., Ainsworth, S., Shankar, H.: Memento: Time Travel for the Web. Technical Report arXiv:0911.1112 (2009)

    Google Scholar 

  16. Yanbe, Y., Jatowt, A., Nakamura, S., Tanaka, K.: Can Social Bookmarking Enhance Search in the Web? In: Proceedings of JCDL 2007, pp. 107–116 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Klein, M., Nelson, M.L. (2011). Find, New, Copy, Web, Page - Tagging for the (Re-)Discovery of Web Pages. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2011. Lecture Notes in Computer Science, vol 6966. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24469-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-24469-8_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-24468-1

  • Online ISBN: 978-3-642-24469-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics