Skip to main content

Towards Scaling Fully Personalized PageRank

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3243))

Abstract

Personalized PageRank expresses backlink-based page quality around user-selected pages in a similar way as PageRank expresses quality over the entire Web. Existing personalized PageRank algorithms can however serve on-line queries only for a restricted choice of page selection. In this paper we achieve full personalization by a novel algorithm that computes a compact database of simulated random walks; this database can serve arbitrary personal choices of small subsets of web pages. We prove that for a fixed error probability, the size of our database is linear in the number of web pages. We justify our estimation approach by asymptotic worst-case lower bounds; we show that exact personalized PageRank values can only be obtained from a database of quadratic size.

Research was supported by grants OTKA T 42559 and T 42706 of the Hungarian National Science Fund, and NKFP-2/0017/2002 project Data Riddle.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bar-Yossef, Z., Berg, A., Chien, S., Fakcharoenphol, J., Weitz, D.: Approximating aggregate queries about web pages via random walks. In: Proceedings of the 26th International Conference on Very Large Data Bases, pp. 535–544. Morgan Kaufmann Publishers Inc., San Francisco (2000)

    Google Scholar 

  2. Bar-Yossef, Z., Broder, A.Z., Kumar, R., Tomkins, A.: Sic transit gloria telae: towards an understanding of the web’s decay. In: Proceedings of the 13th World Wide Web Conference (WWW), pp. 328–337. ACM Press, New York (2004)

    Chapter  Google Scholar 

  3. Boldi, P., Vigna, S.: The webgraph framework I: compression techniques. In: Proceedings of the 13th World Wide Web Conference (WWW), pp. 595–602. ACM Press, New York (2004)

    Chapter  Google Scholar 

  4. Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Finding authorities and hubs from link structures on the world wide web. In: 10th International World Wide Web Conference, pp. 415–429 (2001)

    Google Scholar 

  5. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(l-7), 107–117 (1998)

    Article  Google Scholar 

  6. Broder, A.Z.: On the resemblance and containment of documents. In: Proceedings of the Compression and Complexity of Sequences (SEQUENCES 1997), pp. 21–29. IEEE Computer Society, Los Alamitos (1997)

    Google Scholar 

  7. Chen, Y.-Y., Gan, Q., Suel, T.: I/O-efHcient techniques for computing Page-Rank. In: Proceedings of the eleventh international conference on Information and knowledge management, pp. 549–557. ACM Press, New York (2002)

    Google Scholar 

  8. Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55(3), 441–453 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  9. Eiron, N., McCurley, K.S.: Locality, hierarchy, and bidirectionality in the web. In: Second Workshop on Algorithms and Models for the Web-Graph (WAW 2003) (2003)

    Google Scholar 

  10. Fogaras, D.: Where to start browsing the web? In: Böhme, T., Heyer, G., Unger, H. (eds.) IICS 2003. LNCS, vol. 2877, pp. 65–79. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  11. Fogaras, D., Rácz, B.: A scalable randomized method to compute link-based similarity rank on the web graph. In: Proceedings of the Clustering Information over the Web workshop. Conference on Extending Database Technology (2004), http://www.ilab.sztaki.hu/websearch/Publications/index.html

  12. Google, P.: http://labs.google.com/personalized

  13. Haveliwala, T.H.: Topic-sensitive PageRank. In: Proceedings of the 11th World Wide Web Conference (WWW), Honolulu, Hawaii (2002)

    Google Scholar 

  14. Haveliwala, T.H., Kamvar, S., Jeh, G.: An analytical comparison of approaches to personalizing PageRank. Technical report, Stanford University (2003)

    Google Scholar 

  15. Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: Measuring index quality using random walks on the Web. In: Proceedings of the 8th World Wide Web Conference, Toronto, Canada, pp. 213–225 (1999)

    Google Scholar 

  16. Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: On near-uniform url sampling. In: Proceedings of the 9th international World Wide Web conference on Computer networks, pp. 295–308 (2000)

    Google Scholar 

  17. Henzinger, M.R., Raghavan, P., Rajagopalan, S.: Computing on data streams. In: External memory algorithms, pp. 107–118 (1999)

    Google Scholar 

  18. Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the 12th World Wide Web Conference (WWW), pp. 271–279. ACM Press, New York (2003)

    Google Scholar 

  19. Kamvar, S., Haveliwala, T.H., Manning, C., Golub, G.: Exploiting the block structure of the web for computing PageRank. Technical report, Stanford University (2003)

    Google Scholar 

  20. Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  21. Kushilevitz, E., Nisan, N.: Communication complexity. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  22. Lempel, R., Moran, S.: Rank stability and rank similarity of link-based web ranking algorithms in authority connected graphs. In: Second Workshop on Algorithms and Models for the Web-Graph (WAW 2003) (2003)

    Google Scholar 

  23. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)

    Google Scholar 

  24. Palmer, C.R., Gibbons, P.B., Faloutsos, C.: ANF: a fast and scalable tool for data mining in massive graphs. In: Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 81–90. ACM Press, New York (2002)

    Chapter  Google Scholar 

  25. Richardson, M., Domingos, P.: The Intelligent Surfer: Probabilistic combination of link and content information in PageRank. Advances in Neural Information Processing Systems 14, 1441–1448 (2002)

    Google Scholar 

  26. Rusmevichientong, P., Pennock, D.M., Lawrence, S., Giles, C.L.: Methods for sampling pages uniformly from the world wide web. In: AAAI Fall Symposium on Using Uncertainty Within Computation, pp. 121–128 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fogaras, D., Rácz, B. (2004). Towards Scaling Fully Personalized PageRank. In: Leonardi, S. (eds) Algorithms and Models for the Web-Graph. WAW 2004. Lecture Notes in Computer Science, vol 3243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30216-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30216-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23427-2

  • Online ISBN: 978-3-540-30216-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics