Abstract
Existing search engines contain the picture of the Web from the past and their ranking algorithms are based on data crawled some time ago. However, a user requires not only relevant but also fresh information. We have developed a method for adjusting the ranking of search engine results from the point of view of page freshness and relevance. It uses an algorithm that post-processes search engine results based on the changed contents of the pages. By analyzing archived versions of web pages we estimate temporal qualities of pages, that is, general freshness and relevance of the page to the query topic over certain time frames. For the top quality web pages, their content differences between past snapshots of the pages indexed by a search engine and their present versions are analyzed. Basing on these differences the algorithm assigns new ranks to the web pages without the need to maintain a constantly updated index of web documents.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Amitay, E., Carmel, D., Herscovici, M., Lempel, R., Soffer, A.: Trend Detection Through Temporal Link Analysis. Journal of The American Society for Information Science and Technology 55, 1–12 (2004)
Baeza-Yates, R., Saint-Jean, F., Castillo, C.: Web Structure, Age and Page Quality. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 117–130. Springer, Heidelberg (2002)
Boyapati, V., Chevrier, K., Finkel, A., Glance, N., Pierce, T., Stokton, R., Whitmer, C.: ChangeDetectorTM: A site level monitoring tool for WWW. In: Proceedings of 11th International WWW Conference, Honolulu, Hawaii, USA, pp. 570–579 (2002)
Brewington, E.B., Cybenko, G.: How Dynamic is the Web? In: Proceedings of the 9th International World Wide Web Conference, Amsterdam, The Netherlands, pp. 257–276 (2000)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th World Wide Web Conference, Australia, pp. 107–117 (1998)
Cho, J., Garcia-Molina, H.: The Evolution of the Web and Implications for an Incremental Crawler. In: Proceedings of the 26th International Conference on Very Large Databases (VLDB), Cairo, Egypt, pp. 200–209 (2000)
Cho, J., Ntoulas, A.: Effective Change Detection Using Sampling. In: Proceedings of the 28th VLDB Conference, Hong Kong, SAR China (2002)
Douglis, F., et al.: AT&T Internet difference engine: Tracking and Viewing Changes on the Web. World Wide Web 1(1), 27–44 (1998)
Francisco-Revilla, L., Shipman, F., Furuta, R., Karadkar, U., Arora, A.: Perception of Content, Structure, and Presentation Changes in Web-based Hypertext. In: Proceedings of the 12th ACM Conference on Hypertext and Hypermedia (Hypertext 2001), Aarhus, Denmark, pp. 205–214. ACM Press, New York (2001)
Google News: http://news.google.com
Google Search Engine: http://www.google.com
Internet Archive: http://www.archive.org
Jacob, J., et al.: WebVigiL: An approach to just-in-time information propagation in large network-centric environments. Web Dynamics Book. Springer, Heidelberg (2003)
JTidy: http://jtidy.sourceforge.net
Liu, L., Pu, C., Tang, W.: Continual Queries for Internet Scale Event-Driven Information Delivery. IEEE Knowledge and Data Engineering 11(4), 610–628 (1999), Special Issue on Web Technology
MSN search: http://search.msn.com
Porter Stemmer in Java: http://www.tartarus.org/~martin/PorterStemmer/java.txt
Sato, N., Uehara, M., Sakai, Y.: Temporal Ranking for Fresh Information Retrieval. In: Proceedings of the 6th International Workshop on Information Retrieval with Asian Languages, Sapporo, Japan, pp. 116–123 (2003)
Search Engine Statistics: Freshness Showdown, http://searchengineshowdown.com/stats/freshness.shtml
Tomcat Apache: http://jakarta.apache.org/tomcat/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jatowt, A., Kawai, Y., Tanaka, K. (2005). Temporal Ranking of Search Engine Results. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, JY., Sheng, Q.Z. (eds) Web Information Systems Engineering – WISE 2005. WISE 2005. Lecture Notes in Computer Science, vol 3806. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11581062_4
Download citation
DOI: https://doi.org/10.1007/11581062_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30017-5
Online ISBN: 978-3-540-32286-3
eBook Packages: Computer ScienceComputer Science (R0)