Abstract
P2P deployments are a natural infrastructure for building distributed search networks. Proposed systems support locating and retrieving all results, but lack the information necessary to rank them. Users, however, are primarily interested in the most relevant results, not necessarily all possible results.
Using random sampling, we extend a class of well-known information retrieval ranking algorithms such that they can be applied in this decentralized setting. We analyze the overhead of our approach, and quantify how our system scales with increasing number of documents, system size, document to node mapping (uniform versus non-uniform), and types of queries (rare versus popular terms). Our analysis and simulations show that a) these extensions are efficient, and scale with little overhead to large systems, and b) the accuracy of the results obtained using distributed ranking is comparable to that of a centralized implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Salton, G., Wong, A., Yang, C.: A vector space model for information retrieval. Journal of the American Society for Information Retrieval 18(11), 613–620 (1975)
TREC: Text REtrieval Conference. http://trec.nist.gov/
Dumais, S.T.: Improving the retrieval of information from external sources. Behavior Research Methods, Instruments, and Computers 23(2), 229–236 (1991)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988)
Buckley, C.: Implementation of the SMART information retrieval system. Technical report, Dept. of Computer Science, Cornell University, Ithaca, NY, USA (1985)
Morselli, R., Bhattacharjee, B., Srinivasan, A., Marsh, M.A.: Efficient lookup on unstructured topologies. In: PODC 2005. Proceedings of the 24th symposium on Principles of distributed computing, New York, NY, USA, pp. 77–86 (2005)
Ganesan, P., Sun, Q., Garcia-Molina, H.: Yappers: A peer-to-peer lookup service over arbitrary topology. In: INFOCOM. 22nd Annual Joint Conf. of the IEEE Computer and Communications Societies, San Francisco, USA (2003)
King, V., Saia, J.: Choosing a random peer. In: PODC 2004. Proceedings of the 23rd symposium on Principles of distributed computing, New York, NY, USA, pp. 125–130 (2004)
Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics 23, 493–509 (1952)
Tang, C., Dwarakadas, S.: Hybrid global-local indexing for efficient peer-to-peer information retrieval. In: Proceedings of USENIX NSDI 2004 Conference, San Fransisco, CA (2004)
Gopalakrishnan, V., Bhattacharjee, B., Chawathe, S., Keleher, P.: Efficient peer-to-peer namespace searches. Technical Report CS-TR-4568, University of Maryland, College Park, MD (2004)
Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Proceedings of IFIP/ACM Middleware (2003)
Loo, B.T., Hellerstein, J.M., Huebsch, R., Shenker, S., Stoica, I.: Enhancing P2P file-sharing with an internet-scale query processor. In: VLDB 2004. Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, pp. 432–443 (2004)
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the ACM SIGCOMM 2001, San Diego, California (2001)
Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Proceedings of IFIP/ACM Middleware, Heidelberg, Germany (2001)
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation algorithm: bringing order to the web. Technical report, Dept. of Computer Science, Stanford University (1999)
Wang, Y., DeWitt, D.J.: Computing PageRank in a distributed internet search engine system. In: VLDB 2004. Thirtieth International Conference on Very Large Data Bases, Toronto, Canada, pp. 420–431 (2004)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences (JCSS) 66(4), 614–656 (2003)
Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: PODC 2004. Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing, pp. 206–215. ACM Press, New York (2004)
Michel, S., Triantafillou, P., Weikum, G.: KLEE: A framework for distributed top-k query algorithms. In: VLDB 2005. Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, pp. 637–648 (2005)
Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities. In: HPDC-12. Proceedings of the 12th Symposium on High Performance Distributed Computing, IEEE Press, Los Alamitos (2003)
Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: Proceedings of ACM SIGCOMM 2003, pp. 175–186. ACM Press, New York (2003)
Bhattacharya, I., Kashyap, S.R., Parthasarathy, S.: Similarity searching in peer-to-peer databases. In: ICDCS 2005. Proceedings of the 25th International Conference on Distributed Computing Systems, pp. 329–338 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gopalakrishnan, V., Morselli, R., Bhattacharjee, B., Keleher, P., Srinivasan, A. (2007). Distributed Ranked Search. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds) High Performance Computing – HiPC 2007. HiPC 2007. Lecture Notes in Computer Science, vol 4873. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77220-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-77220-0_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77219-4
Online ISBN: 978-3-540-77220-0
eBook Packages: Computer ScienceComputer Science (R0)