Abstract
Similarity search is becoming a norm in most real-life applications such as digital asset management systems. In such systems, users typically want to retrieve documents or objects similar to terms specified in the query or query examples. In this paper, we present a system for supporting similarity search in P2P networks that retains many desirable properties of existing P2P systems. To support efficient search, peers are formed into clusters based on their contents and clusters are organized as a structured overlay. Optimizations are employed to improve search performance. The experimental results confirm the effectiveness of our proposed system architecture.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aspnes, J., Shah, G.: Skip graphs. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (2003)
Banaei-Kashani, F., Shahabir, C.: Swam: A family of access methods for similarity-search in peer-to-peer data networks. In: Proceedings of the Thirteenth ACM conference on Information and knowledge management (2004)
Bawa, M., Manku, G.S., Raghavan, P.: Sets: Search enhanced by topic segmentation. In: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval (2003)
Berry, M., Drmac, Z., Jessup, E.: Matrices, vector spaces, and information retrieval. SIAM Review 41(2) (1999)
Bharambe, A., Agrawal, M., Seshan, S.: Mercury: Supporting scalable multi-attribute range queries. In: Proceedings of the ACM Special Interest Group on Data Communication(SIGCOMM) (2004)
Cohen, E., Fiat, A., Kaplan, H.: Associative search in peer to peer networks: Harnessing latent semantics. In: Proceedings of IEEE INFOCOM (2003)
Cuenca-Acuna, F.M., Nguyen, T.D.: Text-based content search and retrieval in ad hoc p2p communities. In: International Workshop om Peer-to-Peer Computing (co-located with Networking 2002) (2002)
Deerwester, S.C., Dumais, S., Landauer, T.K., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6) (1990)
Ganesan, P., Yang, B., Molina, H.G.: One torus to rule them all: Multi-dimensional queries in p2p systems. In: Proceedings of the Seventh International Workshop on the Web and Databases (2004)
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings ACM SIGMOD Conference on Management of Data (1984)
Harvey, N.J.A., Jones, M.B., Saroiu, S., Theimer, M., Wolman, A.: Skipnet: A scalable overlay network with practical locality properties. In: Fourth USENIX Symposium on Internet Technologies and Systems (USITS 2003) (2003)
Kalogeraki, V., Gunopulos, D., Zeinalipour-Yazti, D.: A local search mechanism for peer-to-peer networks. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management (2002)
King, I., Ng, C.H., Sia, K.C.: Distributed content-based visual information retrieval system on peer-to-peer networks. ACM Transactions on Information Systems (TOIS) 22(3) (2004)
Klampanos, K.A., Jose, J.M.: An architecture for information retrieval over semi-collaborating peer-to-peer networks. In: Proceedings of the 2004 ACM symposium on Applied computing (2004)
Li, M., Lee, W.-C., Sivasubramaniam, A.: Semantic small world: An overlay network for peer-to-peer search. In: Proceedings of the Network Protocols, 12th IEEE International Conference on (ICNP 2004) (2004)
Liu, L., Ryu, K.D., Lee, K.-W.: Keyword fusion for efficient keyword-based search in p2p file sharing. In: Proceedings of the Fourth International Workshop on Global and Peer-to-Peer Computing (2004)
Loo, B.T., Hellerstein, J.M., Huebsch, R., Shenker, S., Stoica, I.: Enhancing p2p file-sharing with an internet-scale query processor. In: Proceedings of the 30th International Conference on. Very Large Data Bases (2004)
Lv, C., Cao, P., Cohen, E., LI, K., Shenker, S., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: Proceedings of the 16th annual ACM International Conference on supercomputing (2002)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable content-addressable network. In: Proceedings of the ACM Special Interest Group on Data Communication(SIGCOMM) (2001)
Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Endler, M., Schmidt, D.C. (eds.) Middleware 2003. LNCS, vol. 2672, Springer, Heidelberg (2003)
Shu, Y., Ooi, B.C., Tan, K.-L.: Supporting multi-dimensional range queries in peer-to-peer systems. In: Proceedings of IEEE P2P (2005)
Sripanidkulchai, K., Maggs, B., Zhang, H.: Efficient content location using interest-based locality in peer-topeer systems. In: Proceedings of IEEE INFOCOM (2003)
Stoica, I., Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the ACM Special Interest Group on Data Communication(SIGCOMM) (2001)
Tang, C., Dwarkadas, S.: Hybrid global-local indexing for efficient peer-to-peer information retrieval. In: Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI) (2004)
Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: Proceedings of the ACM Special Interest Group on Data Communication(SIGCOMM) (2003)
Text retrieval conference(trec), http://trec.nist.org
Zhang, R., Hu, Y.C.: Assisted peer-to-peer search with partial indexing. In: Proceedings of IEEE INFOCOM (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shu, Y., Yu, B. (2006). Clustering Peers Based on Contents for Efficient Similarity Search. In: Li Lee, M., Tan, KL., Wuwongse, V. (eds) Database Systems for Advanced Applications. DASFAA 2006. Lecture Notes in Computer Science, vol 3882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733836_25
Download citation
DOI: https://doi.org/10.1007/11733836_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33337-1
Online ISBN: 978-3-540-33338-8
eBook Packages: Computer ScienceComputer Science (R0)