Advertisement

Information Retrieval

, Volume 15, Issue 6, pp 503–526 | Cite as

Lumping algorithms for computing Google’s PageRank and its derivative, with attention to unreferenced nodes

  • Qing Yu
  • Zhengke Miao
  • Gang Wu
  • Yimin Wei
Article

Abstract

In this paper, we introduce five type nodes for lumping the Web matrix, and give a unified presentation of some popular lumping methods for PageRank. We show that the PageRank problem can be reduced to solving the PageRank corresponding to the strongly non-dangling and referenced nodes, and the full PageRank vector can be easily derived by some recursion formulations. Our new lumping strategy can reduce the original PageRank problem to a much smaller one, and it is much cheaper than the recursively reordering scheme. Furthermore, we discuss sensitivity of the PageRank vector, and present a lumping algorithm for computing its first order derivative. Numerical experiments show that the new algorithms are favorable when the matrix is large and the damping factor is high.

Keywords

Google PageRank Web information retrieval Dangling nodes Unreferenced nodes 

Notes

Acknowledgments

We would like to express our sincere thanks to Professor Justin Zobel and two anonymous reviewers for their invaluable suggestions that make us greatly improve the representation of this paper. Meanwhile, we are grateful to Dr. David Gleich and Professor Tim Davis for data files of the Web matrices, and to Dr. Amy N. Langville for providing us MATLAB codes of the recursively reordering algorithm (Langville and Meyer 2006a). Moreover, we thank Dr. Xing-hua Shi for MATLAB codes of Algorithms 1 and 2. Finally, Gang Wu and Qing Yu would like to thank School of Mathematical Sciences of Xuzhou Normal University for the use of their facilities during the development of this project. Zhengke Miao is supported by the National Natural Science Foundation of China under grants 10871166 and 11171288. Gang Wu is supported by the National Science Foundation of China under grants 10901132 and 11171289, the Qing-Lan Project of Jiangsu Province, and the 333 Project of Jiangsu Province. Yimin Wei is supported by the National Natural Science Foundation of China under grant 10871051, Doctoral Program of the Ministry of Education under grant 20090071110003, 973 Program Project under grant 2010CB327900, Shanghai Education Committee under Dawn Project 08SG01 and Shanghai Science & Technology Committee.

References

  1. Arasu, A. (2002). PageRank computation and the structure of the Web: Experiments and algorithms. http://www2002.org/CDROM/poster/173.pdf.
  2. Avrachenkov, K., Litvak, N., Nemirovsky, D., & Osipova, N. (2007). Monte Carlo methods in PageRank computation: When one iteration is sufficient. SIAM Journal on Numerical Analysis, 45(2), 890–904.MathSciNetMATHCrossRefGoogle Scholar
  3. Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., & Wiener, J. (2000). Graph structure in the Web. Computer Networks, 33, 309–320.CrossRefGoogle Scholar
  4. Batagelj, V., & Zaveršnik, M. (2012). Generalized cores. http://arxiv.org/abs/cs.DS/0202039.
  5. Boldi, P., Santini, M., & Vigna, S. (2009). PageRank: Functional dependencies. ACM Transactions on Information Systems, 27(4), Article 19.Google Scholar
  6. Brezinski, C., Redivo-Zaglia, M., & Serra-Capizzano, S. (2005). Extrapolation method for PageRank computations. C R Math Acad Sci Paris, 340, 393–397.MathSciNetMATHCrossRefGoogle Scholar
  7. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30, 107–117.CrossRefGoogle Scholar
  8. Brin, S., Motwami, R., Page, L., & Winograd, T. (1998) What can you do with a Web in your pocket? Data Engineering Bulletin, 21, 37–47.Google Scholar
  9. Del Corso, G. M., Gullí, A., & Romani, F. (2004). Fast PageRank computation via a sparse linear system. Internet Mathematics, 2(3), 251–273.CrossRefGoogle Scholar
  10. Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM—50th Anniversary Issue: 1958–2008, 51(1), 1–13.Google Scholar
  11. Donato, D., Leonardi, S., Millozzi, S., & Tsaparas, P. (2008). Mining the inner structure of the Web graph. Journal of Physics A: Mathematical and Theoretical, 41(22).Google Scholar
  12. Eiron, N., McCurley, K. S., & Tomlin, J. A. (2004). Ranking the web frontier. In Proceedings of the 13th international conference on World Wide Web table of contents (pp. 309–318), New York, NY, USA.Google Scholar
  13. Gleich, D., Glynn, P., Golub, G. H., & Greif, C. (2007). Three results on the PageRank vector: Eigenstructure, sensitivity and the derivative. In A. Frommer, M. Mahoney, & D. Szyld (Eds.), Proceedings of the Dagstuhl conference in Web retrieval and numerical linear algebra algorithms.Google Scholar
  14. Gleich, D., Gray, A., Greif, C., & Lau, T. (2010). An inner-outer iteration for computing PageRank. SIAM Journal on Scientific Computing, 32, 349–371.MathSciNetMATHCrossRefGoogle Scholar
  15. Gleich, D., Zhukov, L., & Berkhin, P. (2005). Fast parallel PageRank: A linear system approach. WWW2005, Chiba and Japan.Google Scholar
  16. Golub, G. H., & Greif, C. (2006). An Arnoldi-type algorithm for computing PageRank. BIT, 46, 759–771.MathSciNetMATHCrossRefGoogle Scholar
  17. Golub, G. H., & Van Loan, C. F. (1996). Matrix computations, 3rd edn. Baltimore and London: The Johns Hopkins University PressMATHGoogle Scholar
  18. Haveliwala, T., Kamvar, S., Klein, D., Manning, C., & Golub, G. H. (2003). Computing PageRank using power extrapolation. Stanford University Technical Report.Google Scholar
  19. Higham, N. J. (2002). Accuracy and stability of numerical algorithms, 2nd edn. Philadelphia: SIAMMATHCrossRefGoogle Scholar
  20. Ipsen, I., & Selee, T. (2007). PageRank computation, with special attention to dangling nodes. SIAM Journal on Matrix Analysis and Applications, 29, 1281–1296.MathSciNetCrossRefGoogle Scholar
  21. Ipsen, I., & Wills, R. (2006). Mathematical properties and analysis of Google’s PageRank. Bol Soc Esp Mat Apl, 34, 191–196.MathSciNetMATHGoogle Scholar
  22. Kamvar, S., Haveliwala, T., & Golub, G. H. (2004). Adaptive methods for the computation of PageRank. Linear Algebra and its Applications, 386, 51–65.MathSciNetMATHCrossRefGoogle Scholar
  23. Kamvar, S., Haveliwala, T., Manning, C., & Golub, G. H. (2003). Extrapolation methods for accelerating PageRank computations. In Twelfth international World Wide Web conference.Google Scholar
  24. Kamvar, S., Haveliwala, T., Manning, C., & Golub, G. H. (2003). Exploiting the block structure of the Web for computing PageRank. Stanford University Technical Report, SCCM-03-02.Google Scholar
  25. Langville, A., & Meyer, C. (2006). A reordering for the PageRank problem. SIAM Journal on Scientific Computing, 27, 2112–2120.MathSciNetMATHCrossRefGoogle Scholar
  26. Langville, A., & Meyer, C. (2006). Google’s PageRank and beyond: The science of search engine rankings. Princeton, NJ: Princeton University PressMATHGoogle Scholar
  27. Lee, C., Golub, G. H., & Zenios, S. (2007). A two-stage algorithm for computing PageRank and multistage generalizations. Internet Mathematics, 4(4), 299–328.MathSciNetCrossRefGoogle Scholar
  28. Lin, Y., Shi, X., & Wei, Y. (2009). On computing PageRank via lumping the Google matrix. Journal of Computational and Applied Mathematics, 224, 702–708.MathSciNetMATHCrossRefGoogle Scholar
  29. Page, L., Brin, S., Motwami, R., & Winograd, T. (1998). The PageRank citation ranking: Bring order to the Web. Technical Report, Computer Science Department, Stanford University.Google Scholar
  30. Saad, Y. (2003). Iterative methods for sparse linear systems, 2nd edn. Philadelphia, PA: SIAMMATHCrossRefGoogle Scholar
  31. Wills, R., & Ipsen, I. (2009). Ordinal ranking for Google’s PageRank. SIAM Journal on Matrix Analysis and Applications, 30, 1677–1696.MathSciNetMATHCrossRefGoogle Scholar
  32. Wu, G., & Wei, Y. (2007). A Power-Arnoldi algorithm for computing PageRank. Numerical Linear Algebra with Applications, 14(7), 521–546.MathSciNetMATHCrossRefGoogle Scholar
  33. Wu, G., & Wei, Y. (2010). Arnoldi versus GMRES for computing PageRank: A theoretical contribution to Google’s PageRank problem. ACM Transactions on Information Systems, 28(3), Article 11.Google Scholar
  34. Wu, G., & Wei, Y. (2010). An Arnoldi-Extrapolation algorithm for computing PageRank. Journal of Computational Applied Mathematics, 234, 3196–3212.MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Arts and SciencesXuzhou Higher Normal SchoolXuzhouPeople’s Republic of China
  2. 2.School of Mathematical SciencesXuzhou Normal UniversityXuzhouPeople’s Republic of China
  3. 3.School of Mathematical Sciences and Shanghai Key Laboratory of Contemporary Applied MathematicsFudan UniversityShanghaiPeople’s Republic of China

Personalised recommendations