Skip to main content

Similarity-Based Document Distribution for Efficient Distributed Information Retrieval

  • Conference paper
Web Information Systems Engineering – WISE 2007 (WISE 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4831))

Included in the following conference series:

  • 1155 Accesses

Abstract

Performing information retrieval (IR) efficiently in a distributed environment is currently one of the main challenges in IR. Document representations are distributed among nodes in a manner that allows a query processing algorithm to efficiently direct queries to those nodes that contribute to the result. Existing term-based document distribution algorithms do not scale with large collection sizes or many-term queries because they incur heavy network traffic during the distribution and query phases.

We propose a novel algorithm for document distribution, namely distance-based document distribution. The distribution obtained by our algorithm allows answering any IR query effectively by contacting only a few nodes, independent of both document collection size and network size, thereby improving efficiency. We accomplish this by linearizing the information retrieval search space such that it reflects the ranking formula which will be used for later retrieval.

Our experimental evaluation indicates that effective information retrieval can be efficiently accomplished in distributed networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Callan, J.: Distributed Information Retrieval. In: Bruce Croft, W. (ed.) Advances Information Retrieval: Recent Research from the CIIR, Ch. 5, pp. 127–150. Kluwer Academic Publishers, Dordrecht (2000)

    Google Scholar 

  2. Aberer, K., Alima, L., Ghodsi, A., Girdzijauskas, S., Haridi, S., Hauswirth, M.: The essence of p2p: a reference architecture for overlay networks. In: P2P 2005. Fifth IEEE International Conference on Peer-to-Peer Computing, pp. 11–20 (August 31- September 2, 2005)

    Google Scholar 

  3. Samet, H.: The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading (1989)

    Google Scholar 

  4. Rowstron, A.I.T., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, pp. 329–350. Springer, Heidelberg (2001)

    Google Scholar 

  5. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network, pp. 161–172 (2001)

    Google Scholar 

  6. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications, pp. 149–160 (2001)

    Google Scholar 

  7. Zhao, B.Y., Kubiatowicz, J.D., Joseph, A.D.: Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing, University of California at Berkeley (2001)

    Google Scholar 

  8. Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: SIGCOMM 2003. Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications, pp. 175–186. ACM Press, New York, NY, USA (2003)

    Chapter  Google Scholar 

  9. Neumann, T., Bender, M., Michel, S., Weikum, G.: A reproducible benchmark for p2p retrieval. In: Bonnet, P., Manolescu, I. (eds.) ExpDB, pp. 1–8. ACM, New York (2006)

    Google Scholar 

  10. Aghbari, Z.A., Makinouchi, A.: Linearization approach for efficient KNN search of high-dimensional data. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS, vol. 3129, pp. 229–238. Springer, Heidelberg (2004)

    Google Scholar 

  11. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  12. Aberer, K., Klemm, F., Rajman, M., Wu, J.: An architecture for peer-to-peer information retrieval [17]

    Google Scholar 

  13. Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, pp. 21–40. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  14. Tang, C., Xu, Z., Mahalingam, M.: psearch: information retrieval in structured overlays. SIGCOMM Comput. Commun. Rev. 33(1), 89–94 (2003)

    Article  Google Scholar 

  15. Nottelmann, H., Fischer, G., Titarenko, A., Nurzenski, A.: An integrated approach for searching and browsing in heterogeneous peer-to-peer networks. In: Heterogeneous and Distributed Information Retrieval (2005)

    Google Scholar 

  16. Bender, M., Michel, S., Weikum, G., Zimmer, C.: Bookmark-driven query routing in peer-to-peer web search [17]

    Google Scholar 

  17. Callan, J., Fuhr, N., Nejdl, W. (eds.): Proceedings of the SIGIR Workshop on Peer-to-Peer Information Retrieval, 27th Annual International ACM SIGIR Conference, Sheffield, UK (July 29, 2004). In: Callan, J., Fuhr, N., Nejdl, W. (eds.): Peer-to-Peer Information Retrieval (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Boualem Benatallah Fabio Casati Dimitrios Georgakopoulos Claudio Bartolini Wasim Sadiq Claude Godart

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Herschel, S. (2007). Similarity-Based Document Distribution for Efficient Distributed Information Retrieval. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds) Web Information Systems Engineering – WISE 2007. WISE 2007. Lecture Notes in Computer Science, vol 4831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76993-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76993-4_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76992-7

  • Online ISBN: 978-3-540-76993-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics