Abstract
Web archives like the Internet Archive preserve the evolutionary history of large portions of the Web. Access to them, however, is still via rather limited interfaces – a search functionality is often missing or ignores the time axis. Time-travel search alleviates this shortcoming by enriching keyword queries with a time-context of interest. In order to be effective, time-travel queries require historical PageRank scores. In this paper, we address this requirement and propose rank synopses as a novel structure to compactly represent and reconstruct historical PageRank scores. Rank synopses can reconstruct the PageRank score of a web page as of any point during its lifetime, even in the absence of a snapshot of the Web as of that time. We further devise a normalization scheme for PageRank scores to make them comparable across different graphs. Through a comprehensive evaluation over different datasets, we demonstrate the accuracy and space-economy of the proposed methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Internet Archive, http://www.archive.org
Nutch WAX, http://archive-access.sourceforge.net/projects/nutch
The Digital Bibliography and Library Project (DBLP), http://dblp.uni-trier.de
The European Archive, http://www.europarchive.org
Wikipedia, the free encyclopedia, http://en.wikipedia.org
Amitay, E., Carmel, D., Hersovici, M., Lempel, R., Soffer, A.: Trend Detection Through Temporal Link Analysis. JASIST 55(14) (2004)
Baeza-Yates, R.A., Castillo, C., Saint-Jean, F.: Web Structure, Dynamics and Page Quality. In: Levene, M., Poulovassilis, A. (eds.) Web Dynamics, Springer, Heidelberg (2004)
Bar-Yossef, Z., Broder, A.Z., Kumar, R., Tomkins, A.: Sic Transit Gloria Telae: Towards an Understanding of the Web’s Decay. WWW (2004)
Bellman, R.: On the Approximation of Curves by Line Segments Using Dynamic Programming. CACM 4(6) (1961)
Berberich, K., Bedathur, S., Weikum, G.: Rank Synopses for Efficient Time Travel on the Web Graph. CIKM (2006)
Berberich, K., Bedathur, S., Vazirgiannis, M., Weikum, G.: Comparing Apples and Oranges: Normalized PageRank for Evolving Graphs WWW (2007)
Berberich, K., Bedathur, S., Neumann, T., Weikum, G.: A Time Machine for Text Search. SIGIR (2007)
Berberich, K., Vazirgiannis, M., Weikum, G.: Time-aware Authority Ranking. Internet Mathematics, 2(3) (2005)
Bianchini, M., Gori, M., Scarselli, F.: Inside PageRank. ACM TOIT, 5(1) (2005)
Boldi, P., Santini, M., Vigna, S.: Do your worst to make the best: Paradoxical Effects in PageRank incremental computations. Internet Mathematics, 2(3) (2005)
Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Link Analysis Ranking: Algorithms, Theory, and Experiments. ACM TOIT, 5(1) (2005)
Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, 30(1–7) (1998)
Cho, J., Roy, S., Adams, R.E.: Page Quality: in Search of an Unbiased Web Ranking. SIGMOD (2005)
Eiron, N., McCurley, K.S., Tomlin, J.A.: Ranking the Web Frontier. WWW (2004)
Fetterly, D., Manasse, M., Najork, M., Wiener, J.L.: A Large-Scale Study of the Evolution of Web Pages. Software: Practice and Experience, 34(2) (2004)
Gyöngyi, Z., Garcia-Molina, H.: Link Spam Alliances. VLDB (2005)
Kahle, B.: Preserving the Internet. Scientific American, 276(3) (1997)
Keogh, E.J., Chu, S., Hart, D., Pazzani, M.J.: An Online Algorithm for Segmenting Time Series. ICDM (2001)
Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. JACM, 46(5) (1999)
Koschützki, D., Lehmann, K.A., Tenfelde-Podehl, D., Zlotowski, O.: Advanced Centrality Concepts. In: Brandes, U., Erlebach, T. (eds.) Network Analysis. LNCS, vol. 3418, Springer, Heidelberg (2005)
Langville, A.N., Meyer, C.: Deeper Inside PageRank. Internet Mathematics, 1(3) (2004)
Meyer, P.S., Yung, J.W., Ausubel, J.J.: A Primer on Logistic Growth and Substitution. Technological Forecasting and Social Change, 61(3) (1999)
Nelder, J.A., Mead, R.: A Simplex Algorithm for Function Minimization. Computer Journal, 7 (1965)
Ntoulas, A., Cho, J., Olston, C.: What’s New on the Web?: The Evolution of the Web from a Search Engine Perspective. WWW (2004)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Tech. rep. Stanford Digital Library Technologies Project (1998)
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2005)
Terzi, E., Tsaparas, P.: Efficient Algorithms for Sequence Segmentation. SIAM-DM (2006)
Yu, P.S., Li, X., Liu, B.: On the Temporal Dimension of Search. WWW (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berberich, K., Bedathur, S., Weikum, G. (2007). A Pocket Guide to Web History. In: Ziviani, N., Baeza-Yates, R. (eds) String Processing and Information Retrieval. SPIRE 2007. Lecture Notes in Computer Science, vol 4726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75530-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-75530-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75529-6
Online ISBN: 978-3-540-75530-2
eBook Packages: Computer ScienceComputer Science (R0)