Skip to main content

Clustering Peers Based on Contents for Efficient Similarity Search

  • Conference paper
  • 1021 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3882))

Abstract

Similarity search is becoming a norm in most real-life applications such as digital asset management systems. In such systems, users typically want to retrieve documents or objects similar to terms specified in the query or query examples. In this paper, we present a system for supporting similarity search in P2P networks that retains many desirable properties of existing P2P systems. To support efficient search, peers are formed into clusters based on their contents and clusters are organized as a structured overlay. Optimizations are employed to improve search performance. The experimental results confirm the effectiveness of our proposed system architecture.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aspnes, J., Shah, G.: Skip graphs. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (2003)

    Google Scholar 

  2. Banaei-Kashani, F., Shahabir, C.: Swam: A family of access methods for similarity-search in peer-to-peer data networks. In: Proceedings of the Thirteenth ACM conference on Information and knowledge management (2004)

    Google Scholar 

  3. Bawa, M., Manku, G.S., Raghavan, P.: Sets: Search enhanced by topic segmentation. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (2003)

    Google Scholar 

  4. Berry, M., Drmac, Z., Jessup, E.: Matrices, vector spaces, and information retrieval. SIAM Review 41(2) (1999)

    Google Scholar 

  5. Bharambe, A., Agrawal, M., Seshan, S.: Mercury: Supporting scalable multi-attribute range queries. In: Proceedings of the ACM Special Interest Group on Data Communication(SIGCOMM) (2004)

    Google Scholar 

  6. Cohen, E., Fiat, A., Kaplan, H.: Associative search in peer to peer networks: Harnessing latent semantics. In: Proceedings of IEEE INFOCOM (2003)

    Google Scholar 

  7. Cuenca-Acuna, F.M., Nguyen, T.D.: Text-based content search and retrieval in ad hoc p2p communities. In: International Workshop om Peer-to-Peer Computing (co-located with Networking 2002) (2002)

    Google Scholar 

  8. Deerwester, S.C., Dumais, S., Landauer, T.K., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6) (1990)

    Google Scholar 

  9. Ganesan, P., Yang, B., Molina, H.G.: One torus to rule them all: Multi-dimensional queries in p2p systems. In: Proceedings of the Seventh International Workshop on the Web and Databases (2004)

    Google Scholar 

  10. Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings ACM SIGMOD Conference on Management of Data (1984)

    Google Scholar 

  11. Harvey, N.J.A., Jones, M.B., Saroiu, S., Theimer, M., Wolman, A.: Skipnet: A scalable overlay network with practical locality properties. In: Fourth USENIX Symposium on Internet Technologies and Systems (USITS 2003) (2003)

    Google Scholar 

  12. Kalogeraki, V., Gunopulos, D., Zeinalipour-Yazti, D.: A local search mechanism for peer-to-peer networks. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management (2002)

    Google Scholar 

  13. King, I., Ng, C.H., Sia, K.C.: Distributed content-based visual information retrieval system on peer-to-peer networks. ACM Transactions on Information Systems (TOIS) 22(3) (2004)

    Google Scholar 

  14. Klampanos, K.A., Jose, J.M.: An architecture for information retrieval over semi-collaborating peer-to-peer networks. In: Proceedings of the 2004 ACM symposium on Applied computing (2004)

    Google Scholar 

  15. Li, M., Lee, W.-C., Sivasubramaniam, A.: Semantic small world: An overlay network for peer-to-peer search. In: Proceedings of the Network Protocols, 12th IEEE International Conference on (ICNP 2004) (2004)

    Google Scholar 

  16. Liu, L., Ryu, K.D., Lee, K.-W.: Keyword fusion for efficient keyword-based search in p2p file sharing. In: Proceedings of the Fourth International Workshop on Global and Peer-to-Peer Computing (2004)

    Google Scholar 

  17. Loo, B.T., Hellerstein, J.M., Huebsch, R., Shenker, S., Stoica, I.: Enhancing p2p file-sharing with an internet-scale query processor. In: Proceedings of the 30th International Conference on. Very Large Data Bases (2004)

    Google Scholar 

  18. Lv, C., Cao, P., Cohen, E., LI, K., Shenker, S., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: Proceedings of the 16th annual ACM International Conference on supercomputing (2002)

    Google Scholar 

  19. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable network. In: Proceedings of the ACM Special Interest Group on Data Communication(SIGCOMM) (2001)

    Google Scholar 

  20. Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  21. Shu, Y., Ooi, B.C., Tan, K.-L.: Supporting multi-dimensional range queries in peer-to-peer systems. In: Proceedings of IEEE P2P (2005)

    Google Scholar 

  22. Sripanidkulchai, K., Maggs, B., Zhang, H.: Efficient content location using interest-based locality in peer-topeer systems. In: Proceedings of IEEE INFOCOM (2003)

    Google Scholar 

  23. Stoica, I., Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the ACM Special Interest Group on Data Communication(SIGCOMM) (2001)

    Google Scholar 

  24. Tang, C., Dwarkadas, S.: Hybrid global-local indexing for efficient peer-to-peer information retrieval. In: Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI) (2004)

    Google Scholar 

  25. Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: Proceedings of the ACM Special Interest Group on Data Communication(SIGCOMM) (2003)

    Google Scholar 

  26. Text retrieval conference(trec), http://trec.nist.org

  27. Zhang, R., Hu, Y.C.: Assisted peer-to-peer search with partial indexing. In: Proceedings of IEEE INFOCOM (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shu, Y., Yu, B. (2006). Clustering Peers Based on Contents for Efficient Similarity Search. In: Li Lee, M., Tan, KL., Wuwongse, V. (eds) Database Systems for Advanced Applications. DASFAA 2006. Lecture Notes in Computer Science, vol 3882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733836_25

Download citation

  • DOI: https://doi.org/10.1007/11733836_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33337-1

  • Online ISBN: 978-3-540-33338-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics