Abstract
In recent years, information retrieval methods focusing on the link analysis have been developed; The PageRank and HITS are two typical ones According to the hierarchical organization of Web pages, we could partition the Web graph into blocks at different level, such as page level, directory level, host level and domain level. On the basis of block, we could analyze the different hyperlinks among pages. Several approaches proposed that the intra-hyperlink in a host maybe less useful in computing the PageRank. However, there are no reports on how concretely the intra- or inter-hyperlink affects the PageRank. Furthermore, based on different block level, inter-hyperlink and intra-hyperlink can be two relative concepts. Thus which level should be optimal to distinguish the intra- or inter-hyperlink? And how the ratio set between the intra-hyperlink and inter-hyperlink could ultimately improve performance of the PageRank algorithm? In this paper, we analyze the link distribution at the different block level and evaluate the importance of the intra- and inter-hyperlink to PageRank on the TREC Web Track data set. Experiment shows that, if we set the block at host level and the ratio of the weight between the intra-hyperlink and inter-hyperlink is 1:4, the retrieval could achieve the best performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amento, B., Terveen, L., Hill, W.: Does “authority” mean quality? predicting expert quality ratings of web documents. In: Proc. of ACM SIGIR 2000, pp. 296–303 (2000)
Bharat, K., Henzinger, M.R.: Improved algorithms for topic distillation in a hyperlinked environment. In: Proc. of the ACM-SIGIR (1998)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: The Seventh International World Wide Web Conference (1998)
Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P., Rajagopalan, S.: Automatic resource list compilation by analyzing hyperlink structure and associated text. In: Proc. of the 7th Int. World Wide Web Conference (May 1998)
Chakrabarti, S.: Integrating the Document Object Model with hyperlinks for enhanced topic distillation and information extraction. In: The 10th International World Wide Web Conference (2001)
Chakrabarti, S., Joshi, M., Tawde, V.: Enhanced topic distillation using text, markup tags, and hyperlinks. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 208–216. ACM Press, New York (2001)
Monz, C., Kamps, J., de Rijke, M.: The University of Amsterdam at TREC 2002 (2002)
Davison, B.D.: Recognizing nepotistic links on the Web. In: Artificial Intelligence for Web Search, pp. 23–28. AAAI Press, Menlo Park (2000)
Flake, G., Lawrence, S., Giles, L., Coetzee, F.: Self-organization and identification of web communities. IEEE Computer, 66–71 (2002)
Gibson, D., Kleinberg, J., Raghavan, P.: Inferring web communities from link topology. In: Proceedings of the 9th ACM Conference on Hypertext and Hypermedia (HYPER 1998), pp. 225–234. ACM Press, New York (1998)
Haveliwala, T.H.: Topic-sensitive PageRank. In: Proc. of the 11th Int. World Wide Web Conference (May 2002)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–622 (1999)
Bharat, K., Chang, B.-W., Henzinger, M.R., Ruhl, M.: Who Links to Whom: Mining Linkage between Web Sites. In: 1st International Conference on Data Mining (ICDM), pp. 51–58 (2001)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web, Technical report, Stanford University, Stanford, CA (1998)
Eiron, N., McCurley, K.S.: Locality, Hierarchy, and Bidirectionality on the Web. In: Workshop on Web Algorithms and Models (2003)
Robertson, S.E.: Overview of the okapi projects. Journal of Documentation 53(1), 3–7 (1997)
Silverstein, C., Henzingger, M., Marais, J., Moricz, M.: Analysis of a Very Large Alta- Vista Query Log. Digital SRC Technical Note 1998-014
Kamvar, S., Haveliwala, T., Manning, C., Golub, G.: Exploiting the Block Structure of the Web for Computing PageRank. In: Proc. of the 12th Int. World Wide Web Conference (May 2003)
Hawking, D.: Overview of the TREC-9 Web Track. In: Proc. of the 9th Annual TREC Conference, pp.87–102
TREC, http://trec.nist.gov/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, XM., Xue, GR., Song, WG., Zeng, HJ., Chen, Z., Ma, WY. (2004). Exploiting PageRank at Different Block Level. In: Zhou, X., Su, S., Papazoglou, M.P., Orlowska, M.E., Jeffery, K. (eds) Web Information Systems – WISE 2004. WISE 2004. Lecture Notes in Computer Science, vol 3306. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30480-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-540-30480-7_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23894-2
Online ISBN: 978-3-540-30480-7
eBook Packages: Springer Book Archive