A Framework for Decentralized Ranking in Web Information Retrieval
- 460 Downloads
Search engines are among the most important applications or services on the web. Most existing successful search engines use global ranking algorithms to generate the ranking of documents crawled in their databases. However, global ranking of documents has two potential problems: high computation cost and potentially poor rankings. Both of the problems are related to the centralized computation paradigm. We propose to decentralize the task of ranking. This requires two things: a decentralized architecture and a logical framework for ranking computation. In the paper we introduce a ranking algebra providing such a formal framework. Through partitioning and combining rankings, we manage to compute document rankings of large-scale web data sets in a localized fashion. We provide initial results, demonstrating that the use of such an approach can ameliorate the above-mentioned problems. The approach presents a step towards P2P Web search engines.1
Keywordssearch engines information retrieval P2P systems link analysis
Unable to display preview. Download preview PDF.
- 1.Sergey Brin, Lawrence Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, 2000.Google Scholar
- 2.Larry Page, Sergey Brin, R. Motwani, T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web”, 1998.Google Scholar
- 3.Jon Kleinberg, “Authoritative Sources in a Hyperlinked Environment”, 1998.Google Scholar
- 4.Danny Sullivan, “New AllTheWeb.com Goes Live”, http://searchenginewatch.com/sereport/01/08-alltheweb.html, 2001.
- 5.Chris Sherman, “It’s Fresher at FAST”, http://www.searchenginewatch.com/searchday/01/sd0725-fast.html, 2001.
- 6.Mitch Wagner, “Google Bets The Farm On Linux”, http://www.internetwk.com/lead/lead060100.htm, 2000.
- 7.UC Berkeley SIMS, “How Much Information — Internet Summary”, http://www.sims.berkeley.edu/research/projects/how-much-info/internet.html, 2000.
- 8.Keith A. Baggerly, “Visual Estimation of Structure in Ranked Data”, PhD thesis, Rice University, 1995.Google Scholar
- 9.K. Bharat, M. R. Henzinger, “Improved algorithms for topic distillation in a hyperlinked environment”, in Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pages 104–111, Melbourne, Australia, August 1998. ACM Press, New York.CrossRefGoogle Scholar
- 10.Danny Sullivan, “Google Adds More “Fresh” Pages, Changes Robots.txt & 403 Errors, Gains iWon”, http://searchenginewatch.com/sereport/02/08-google.html, Aug. 5, 2002.