Advertisement

Where to Start Browsing the Web?

  • Dániel Fogaras
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2877)

Abstract

Both human users and crawlers face the problem of finding good start pages to explore some topic. We show how to assist in qualifying pages as start nodes by link-based ranking algorithms. We introduce a class of hub ranking methods based on counting the short search paths of the Web. Somewhat surprisingly, the Page Rank scores computed on the reversed Web graph turn out to be a special case of our class of rank functions. Besides query based examples, we propose graph based techniques to evaluate the performance of the introduced ranking algorithms. Centrality analysis experiments show that a small portion of Web pages induced by the top ranked pages dominates the Web in the sense that other pages can be accessed from them within a few clicks on the average; furthermore the removal of such nodes destroys the connectivity of the Web graph rapidly. By calculating the dominations and connectivity decay we compare and analyze the proposed ranking algorithms without the need of human interaction solely from the structure of the Web. Apart from ranking algorithms, the existence of central pages is interesting in its own right, providing a deeper insight to the Small World property of the Web graph.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Albert, R., Jeong, H., Barabási, A.: Error and attack tolerance of complex networks. Nature 406, 378–382 (2000)CrossRefGoogle Scholar
  2. 2.
    Amento, B., Terveen, L., Hill, W.: Does authority mean quality? Predicting expert quality ratings of web documents. In: Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York (2000)Google Scholar
  3. 3.
    Azar, Y., Fiat, A., Karlin, A.R., McSherry, F., Saia, J.: Spectral analysis of data. In: ACM Symposium on Theory of Computing, pp. 619–626 (2001)Google Scholar
  4. 4.
    Barabási, A.-L., Albert, R., Jeong, H.: Scale-free characteristics of random networks: the topology of the word-wide web. Physica A 281, 69–77 (2000)CrossRefGoogle Scholar
  5. 5.
    Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Finding authorities and hubs from link structures on the world wide web. In: 10th International World Wide Web Conference, pp. 415–429 (2001)Google Scholar
  6. 6.
    Boyan, J., Freitag, D., Joachims, T.: A machine learning architecture for optimizing web search engines. In: Proceedings of the AAAI Workshop on Internet-Based Information Systems (1996)Google Scholar
  7. 7.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)CrossRefGoogle Scholar
  8. 8.
    Chakrabarti, S., Dom, B.E., Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A., Gibson, D., Kleinberg, J.: Mining the Web’s link structure. Computer 32(8), 60–67 (1999)CrossRefGoogle Scholar
  9. 9.
    Davison, B.D., Gerasoulis, A., Kleisouris, K., Lu, Y., ju Seo, H., Wang, W., Wu, B.: Discoweb: Applying link analysis to web search. In: Proceedings of the 8th World Wide Web Conference, Toronto, Canada (1999)Google Scholar
  10. 10.
    Dwork, C., Kumar, S.R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: 10th International World Wide Web Conference, Hong Kong, pp. 613–622 (2001)Google Scholar
  11. 11.
    Garey, M., Johnson, D.: Computer and Intractability: A Guide to the Theory of NP-completeness. W.H. Freeman, San Fransisco (1979)Google Scholar
  12. 12.
    Google. Commercial search engine founded by the originators of pagerank, located at, http://www.google.com
  13. 13.
    Haveliwala, T.H.: Topic-sensitive pagerank. In: 11th International World Wide Web Conference, Honolulu, Hawaii (2002)Google Scholar
  14. 14.
    Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)zbMATHCrossRefGoogle Scholar
  15. 15.
    Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Larbin. Multi-purpose web crawlerGoogle Scholar
  17. 17.
    Lempel, R., Moran, S.: The stochastic approach for link-structure analysis (SALSA) and the TKC effect. In: 9th International World Wide Web Conference (2000)Google Scholar
  18. 18.
    Marchiori, M.: The quest for correct information on the web: Hyper search engines. In: 7th International World Wide Web Conference (1998)Google Scholar
  19. 19.
    Ng, A.Y., Zheng, A.X., Jordan, M.: Stable algorithms for link analysis. In: Proc. 24th Annual Intl. ACM SIGIR Conference (2001)Google Scholar
  20. 20.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)Google Scholar
  21. 21.
    Richardson, M., Domingos, P.: The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank. In: Advances in Neural Information Processing Systems 14, MIT Press, Cambridge (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Dániel Fogaras
    • 1
    • 2
  1. 1.Department of Computer Science and Information TheoryBudapest University of Technology and EconomicsBudapestHungary
  2. 2.Computer and Automation Research InstituteHungarian Academy of Sciences (MTA SZTAKI)BudapestHungary

Personalised recommendations