Skip to main content

Abstract

Ranking is perhaps the most important feature of a search engine, as it allows the user to efficiently order the huge amount of pages matching a query according to their relevance to the user’s information need. With respect to traditional textual search engines, Web information retrieval systems build ranking by combining at least two evidences of relevance: the degree of matching of a page—the content score—and the degree of importance of a page—the popularity score. While the content score can be calculated using one of the information retrieval models described in Chap. 3, the popularity score can be calculated from an analysis of the indexed pages’ hyperlink structure using one or more link analysis models. In this chapter we introduce the two most famous link analysis models, PageRank and HITS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 79.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The diameter of the graph is logarithmically proportional to the number of its nodes.

  2. 2.

    More precisely, a power law distribution with a≈2.1.

  3. 3.

    http://googleblog.blogspot.it/2008/05/introduction-to-google-search-quality.html.

  4. 4.

    http://googlewebmastercentral.blogspot.it/2011/06/beyond-pagerank-graduating-to.html.

  5. 5.

    http://www.taoma.com.

  6. 6.

    http://www.ask.com.

References

  1. R. Albert, H. Jeong, A.L. Barabasi, The diameter of the world wide web. Nature 401, 130–131 (1999)

    Article  Google Scholar 

  2. L. Becchetti, C. Castillo, The distribution of pagerank follows a power-law only for particular values of the damping factor, in Proceedings of the 15th International Conference on World Wide Web. WWW’06 (ACM, New York, 2006), pp. 941–942

    Chapter  Google Scholar 

  3. K. Bharat, M.R. Henzinger, Improved algorithms for topic distillation in a hyperlinked environment, in Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR’98 (ACM, New York, 1998), pp. 104–111

    Chapter  Google Scholar 

  4. A. Borodin, G.O. Roberts, J.S. Rosenthal, P. Tsaparas, Link analysis ranking: algorithms, theory, and experiments. ACM Trans. Internet Technol. 5(1), 231–297 (2005)

    Article  Google Scholar 

  5. S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)

    Article  Google Scholar 

  6. A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, J. Wiener, Graph structure in the web, in Proceedings of the 9th International World Wide Web Conference on Computer Networks: the International Journal of Computer and Telecommunications Networking (North-Holland, Amsterdam, 2000), pp. 309–320

    Google Scholar 

  7. S. Chakrabarti, Mining the Web: Discovering Knowledge from Hypertext Data (Morgan Kauffman, San Mateo, 2002)

    Google Scholar 

  8. J.M. Kleinberg, Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  9. A. Langville, C. Meyer, Google’s Page Rank and Beyond: the Science of Search Engine Rankings (Princeton University Press, Princeton, 2008)

    Google Scholar 

  10. R. Lempel, S. Moran, Rank-stability and rank-similarity of link-based web ranking algorithms in authority-connected graphs. Inf. Retr. 8(2), 245–264 (2005)

    Article  Google Scholar 

  11. C.D. Manning, P. Raghavan, H. Schütze, Introduction to information retrieval. 2008. Online edition (2007)

    Google Scholar 

  12. M. Najork, Web crawler architecture, in Encyclopedia of Database Systems, ed. by L. Liu, M.T. Öñzsu (Springer, Berlin, 2009), pp. 3462–3465

    Google Scholar 

  13. M.A. Najork, H. Zaragoza, M.J. Taylor, Hits on the web: how does it compare? in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR’07 (ACM, New York, 2007), pp. 471–478

    Chapter  Google Scholar 

  14. L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank citation ranking: bringing order to the web, Technical report, Stanford InfoLab, 1999

    Google Scholar 

  15. G. Pandurangan, P. Raghavan, E. Upfal, Using PageRank to characterize web structure, in Proceedings of the 8th Annual International Conference on Computing and Combinatorics. COCOON’02 (Springer, London, 2002), pp. 330–339

    Chapter  Google Scholar 

  16. M. Richardson, A. Prakash, E. Brill, Beyond PageRank: machine learning for static ranking, in Proceedings of the 15th International Conference on World Wide Web. WWW’06 (ACM, New York, 2006), pp. 707–715

    Chapter  Google Scholar 

  17. W. Stewart, Introduction to the Numerical Solution of Markov Chains (Princeton University Press, Princeton, 1994)

    MATH  Google Scholar 

  18. T. Upstill, et al., Predicting fame and fortune: PageRank or Indegree? in In Proceedings of the Australasian Document Computing Symposium, ADCS 2003 (2003), pp. 31–40

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ceri, S., Bozzon, A., Brambilla, M., Della Valle, E., Fraternali, P., Quarteroni, S. (2013). Link Analysis. In: Web Information Retrieval. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39314-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39314-3_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39313-6

  • Online ISBN: 978-3-642-39314-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics