Abstract
Personalized PageRank expresses backlink-based page quality around user-selected pages in a similar way as PageRank expresses quality over the entire Web. Existing personalized PageRank algorithms can however serve on-line queries only for a restricted choice of page selection. In this paper we achieve full personalization by a novel algorithm that computes a compact database of simulated random walks; this database can serve arbitrary personal choices of small subsets of web pages. We prove that for a fixed error probability, the size of our database is linear in the number of web pages. We justify our estimation approach by asymptotic worst-case lower bounds; we show that exact personalized PageRank values can only be obtained from a database of quadratic size.
Research was supported by grants OTKA T 42559 and T 42706 of the Hungarian National Science Fund, and NKFP-2/0017/2002 project Data Riddle.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bar-Yossef, Z., Berg, A., Chien, S., Fakcharoenphol, J., Weitz, D.: Approximating aggregate queries about web pages via random walks. In: Proceedings of the 26th International Conference on Very Large Data Bases, pp. 535–544. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Bar-Yossef, Z., Broder, A.Z., Kumar, R., Tomkins, A.: Sic transit gloria telae: towards an understanding of the web’s decay. In: Proceedings of the 13th World Wide Web Conference (WWW), pp. 328–337. ACM Press, New York (2004)
Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: Proceedings of the 13th World Wide Web Conference (WWW), pp. 595–602. ACM Press, New York (2004)
Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Finding authorities and hubs from link structures on the world wide web. In: 10th International World Wide Web Conference, pp. 415–429 (2001)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(l-7), 107–117 (1998)
Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences (SEQUENCES 1997), pp. 21–29. IEEE Computer Society, Los Alamitos (1997)
Chen, Y.-Y., Gan, Q., Suel, T.: I/O-efHcient techniques for computing Page-Rank. In: Proceedings of the eleventh international conference on Information and knowledge management, pp. 549–557. ACM Press, New York (2002)
Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55(3), 441–453 (1997)
Eiron, N., McCurley, K.S.: Locality, hierarchy, and bidirectionality in the web. In: Second Workshop on Algorithms and Models for the Web-Graph (WAW 2003) (2003)
Fogaras, D.: Where to start browsing the web? In: Böhme, T., Heyer, G., Unger, H. (eds.) IICS 2003. LNCS, vol. 2877, pp. 65–79. Springer, Heidelberg (2003)
Fogaras, D., Rácz, B.: A scalable randomized method to compute link-based similarity rank on the web graph. In: Proceedings of the Clustering Information over the Web workshop. Conference on Extending Database Technology (2004), http://www.ilab.sztaki.hu/websearch/Publications/index.html
Google, P.: http://labs.google.com/personalized
Haveliwala, T.H.: Topic-sensitive PageRank. In: Proceedings of the 11th World Wide Web Conference (WWW), Honolulu, Hawaii (2002)
Haveliwala, T.H., Kamvar, S., Jeh, G.: An analytical comparison of approaches to personalizing PageRank. Technical report, Stanford University (2003)
Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: Measuring index quality using random walks on the Web. In: Proceedings of the 8th World Wide Web Conference, Toronto, Canada, pp. 213–225 (1999)
Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: On near-uniform url sampling. In: Proceedings of the 9th international World Wide Web conference on Computer networks, pp. 295–308 (2000)
Henzinger, M.R., Raghavan, P., Rajagopalan, S.: Computing on data streams. In: External memory algorithms, pp. 107–118 (1999)
Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the 12th World Wide Web Conference (WWW), pp. 271–279. ACM Press, New York (2003)
Kamvar, S., Haveliwala, T.H., Manning, C., Golub, G.: Exploiting the block structure of the web for computing PageRank. Technical report, Stanford University (2003)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Kushilevitz, E., Nisan, N.: Communication complexity. Cambridge University Press, Cambridge (1997)
Lempel, R., Moran, S.: Rank stability and rank similarity of link-based web ranking algorithms in authority connected graphs. In: Second Workshop on Algorithms and Models for the Web-Graph (WAW 2003) (2003)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)
Palmer, C.R., Gibbons, P.B., Faloutsos, C.: ANF: a fast and scalable tool for data mining in massive graphs. In: Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 81–90. ACM Press, New York (2002)
Richardson, M., Domingos, P.: The Intelligent Surfer: Probabilistic combination of link and content information in PageRank. Advances in Neural Information Processing Systems 14, 1441–1448 (2002)
Rusmevichientong, P., Pennock, D.M., Lawrence, S., Giles, C.L.: Methods for sampling pages uniformly from the world wide web. In: AAAI Fall Symposium on Using Uncertainty Within Computation, pp. 121–128 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fogaras, D., Rácz, B. (2004). Towards Scaling Fully Personalized PageRank. In: Leonardi, S. (eds) Algorithms and Models for the Web-Graph. WAW 2004. Lecture Notes in Computer Science, vol 3243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30216-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-30216-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23427-2
Online ISBN: 978-3-540-30216-2
eBook Packages: Springer Book Archive