Lumping algorithms for computing Google’s PageRank and its derivative, with attention to unreferenced nodes
In this paper, we introduce five type nodes for lumping the Web matrix, and give a unified presentation of some popular lumping methods for PageRank. We show that the PageRank problem can be reduced to solving the PageRank corresponding to the strongly non-dangling and referenced nodes, and the full PageRank vector can be easily derived by some recursion formulations. Our new lumping strategy can reduce the original PageRank problem to a much smaller one, and it is much cheaper than the recursively reordering scheme. Furthermore, we discuss sensitivity of the PageRank vector, and present a lumping algorithm for computing its first order derivative. Numerical experiments show that the new algorithms are favorable when the matrix is large and the damping factor is high.
KeywordsGoogle PageRank Web information retrieval Dangling nodes Unreferenced nodes
We would like to express our sincere thanks to Professor Justin Zobel and two anonymous reviewers for their invaluable suggestions that make us greatly improve the representation of this paper. Meanwhile, we are grateful to Dr. David Gleich and Professor Tim Davis for data files of the Web matrices, and to Dr. Amy N. Langville for providing us MATLAB codes of the recursively reordering algorithm (Langville and Meyer 2006a). Moreover, we thank Dr. Xing-hua Shi for MATLAB codes of Algorithms 1 and 2. Finally, Gang Wu and Qing Yu would like to thank School of Mathematical Sciences of Xuzhou Normal University for the use of their facilities during the development of this project. Zhengke Miao is supported by the National Natural Science Foundation of China under grants 10871166 and 11171288. Gang Wu is supported by the National Science Foundation of China under grants 10901132 and 11171289, the Qing-Lan Project of Jiangsu Province, and the 333 Project of Jiangsu Province. Yimin Wei is supported by the National Natural Science Foundation of China under grant 10871051, Doctoral Program of the Ministry of Education under grant 20090071110003, 973 Program Project under grant 2010CB327900, Shanghai Education Committee under Dawn Project 08SG01 and Shanghai Science & Technology Committee.
- Arasu, A. (2002). PageRank computation and the structure of the Web: Experiments and algorithms. http://www2002.org/CDROM/poster/173.pdf.
- Batagelj, V., & Zaveršnik, M. (2012). Generalized cores. http://arxiv.org/abs/cs.DS/0202039.
- Boldi, P., Santini, M., & Vigna, S. (2009). PageRank: Functional dependencies. ACM Transactions on Information Systems, 27(4), Article 19.Google Scholar
- Brin, S., Motwami, R., Page, L., & Winograd, T. (1998) What can you do with a Web in your pocket? Data Engineering Bulletin, 21, 37–47.Google Scholar
- Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM—50th Anniversary Issue: 1958–2008, 51(1), 1–13.Google Scholar
- Donato, D., Leonardi, S., Millozzi, S., & Tsaparas, P. (2008). Mining the inner structure of the Web graph. Journal of Physics A: Mathematical and Theoretical, 41(22).Google Scholar
- Eiron, N., McCurley, K. S., & Tomlin, J. A. (2004). Ranking the web frontier. In Proceedings of the 13th international conference on World Wide Web table of contents (pp. 309–318), New York, NY, USA.Google Scholar
- Gleich, D., Glynn, P., Golub, G. H., & Greif, C. (2007). Three results on the PageRank vector: Eigenstructure, sensitivity and the derivative. In A. Frommer, M. Mahoney, & D. Szyld (Eds.), Proceedings of the Dagstuhl conference in Web retrieval and numerical linear algebra algorithms.Google Scholar
- Gleich, D., Zhukov, L., & Berkhin, P. (2005). Fast parallel PageRank: A linear system approach. WWW2005, Chiba and Japan.Google Scholar
- Haveliwala, T., Kamvar, S., Klein, D., Manning, C., & Golub, G. H. (2003). Computing PageRank using power extrapolation. Stanford University Technical Report.Google Scholar
- Kamvar, S., Haveliwala, T., Manning, C., & Golub, G. H. (2003). Extrapolation methods for accelerating PageRank computations. In Twelfth international World Wide Web conference.Google Scholar
- Kamvar, S., Haveliwala, T., Manning, C., & Golub, G. H. (2003). Exploiting the block structure of the Web for computing PageRank. Stanford University Technical Report, SCCM-03-02.Google Scholar
- Page, L., Brin, S., Motwami, R., & Winograd, T. (1998). The PageRank citation ranking: Bring order to the Web. Technical Report, Computer Science Department, Stanford University.Google Scholar
- Wu, G., & Wei, Y. (2010). Arnoldi versus GMRES for computing PageRank: A theoretical contribution to Google’s PageRank problem. ACM Transactions on Information Systems, 28(3), Article 11.Google Scholar