Abstract
Peer-to-peer applications are used to share large volumes of data. An important requirement of these systems is efficient methods for locating the data of interest in a large collection of data. Unfortunately current peer-to-peer systems either offer exact keyword match functionality or provide inefficient text search methods through centralized indexing or flooding. In this paper we propose a method based on popular Information Retrieval techniques to facilitate content-based searches in peer-to-peer systems. A simulation of the proposed design was implemented and its performance was evaluated using some commonly used test collections, including Ohsumed which was used for the TREC-9 Filtering Track. The experiments demonstrate that our approach is scalable as it achieves high recall by visiting only a small subset of the peers.
This research was funded in parts by NSF grants EIAÂ 00-80134, IISÂ 02-09112, and CNFÂ 04-23336.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Napster, http://www.napster.com/
Gnutella, http://www.gnutella.com/
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proceedings of the ACM SIGCOMM, pp. 149–160 (2001)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable content-addressable network. In: Proceedings of the ACM SIGCOMM, pp. 161–172 (2001)
Rowstron, A., Druschel, P.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In: Proceedings of the Middleware (2001)
Zhao, B.Y., Huang, L., Stribling, J., Rhea, S.C., Joseph, A.D., Kubiatowicz, J.: Tapestry: A resilient global-scale overlay for service deployment. IEEE Journal on Selected Areas in Communications 22, 41–53 (2004)
Crespo, A., Garcia-Molina, H.: Routing indices for peer-to-peer systems. In: Proceedings of the ICDCS, pp. 23–32 (2002)
Yang, B., Garcia-Molina, H.: Improving search in peer-to-peer networks. In: Proceedings of the ICDCS, pp. 5–14 (2002)
Cuenca-Acuna, F.M., Peery, C., Martin, R.P., Nguyen, T.D.: Planetp: Using gossiping to build content addressable peer-to-peer information sharing communities. In: Symposium on High Performance Distributed Computing, HPDC (2003)
Bawa, M., Manku, G.S., Raghavan, P.: Sets: search enhanced by topic segmentation. In: Proceedings of the ACM SIGIR, pp. 306–313 (2003)
Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Proceedings of the Middleware, pp. 21–40 (2003)
Gnawali, O.D.: A keyword-set search system for peer-to-peer networks. Master’s thesis, Massachusetts Institute of Technology (2002)
Harren, M., Hellerstein, J.M., Huebsch, R., Loo, B.T., Shenker, S., Stoica, I.: Complex queries in DHT-based peer-to-peer networks. In: Proceedings of the first International Workshop on Peer-to-Peer Systems (2002)
Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: Proceedings of the ACM SIGCOMM, pp. 175–186 (2003)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Zhu, Y., Wang, H., Hu, Y.: Integrating semantics-based access mechanisms with peer-to-peer file systems. In: Proceedings of the 3rd International Conference on Peer-to-Peer Computing (P2P 2003), pp. 118–125 (2003)
Koloniari, G., Pitoura, E.: Content-based routing of path queries in peer-to-peer systems. In: Proceedings the EDBT, pp. 29–47 (2004)
Suel, T., Mathur, C., Wu, J.W., Zhang, J., Delis, A., Kharrazi, M., Long, X., Shanmugasundaram, K.: Odissea: A peer-to-peer architecture for scalable web search and information retrieval. In: Proceedings of the WebDB, pp. 67–72 (2003)
Li, J., Loo, B.T., Hellerstein, J., Kaashoek, F., Karger, D.R., Morris, R.: On the feasibility of peer-to-peer web indexing and search. In: Proceedings of the second International Workshop on Peer-to-Peer Systems (2003)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)
Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces, and information retrieval. SIAM Review 41, 335–362 (1999)
Dumais, S.T.: Latent semantic indexing (lsi): Trec-3 report. In: Proceedings of the Third Text REtrieval Conference (TREC-3), pp. 219–230 (1995)
Papadimitriou, C.H., Raghavan, P., Tamaki, H., Vempala, S.: Latent semantic indexing: A probabilistic analysis. Journal of Computer and System Sciences 61, 217–235 (2000)
Jiang, F., Kannan, R., Littman, M.L., Vempala, S.: Efficient singular value decomposition via improved document sampling. Technical Report CS-99-5, Duke University (1999)
El-Ansary, S., Alima, L.O., Brand, P., Haridi, S.: Efficient broadcast in structured p2p networks. In: Proceedings of the second International Workshop on Peer-to-Peer Systems (2003)
Buckley, C.: Implementation of the SMART information retrieval system. Technical Report 85-686, Cornell University (1985)
Hersh, W.R., Buckley, C.J., Leone, T., Hickam, D.H.: Ohsumed: An interactive retrieval evaluation and new large test collection for research. In: Proceedings of the ACM SIGIR, pp. 192–201 (1994)
Robertson, S., Hull, D.A.: The TREC-9 filtering track final report. In: Proceedings of the 9th Text REtrieval Conference (TREC-9), pp. 25–40 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sahin, O.D., Emekci, F., Agrawal, D., El Abbadi, A. (2005). Content-Based Similarity Search over Peer-to-Peer Systems . In: Ng, W.S., Ooi, BC., Ouksel, A.M., Sartori, C. (eds) Databases, Information Systems, and Peer-to-Peer Computing. DBISP2P 2004. Lecture Notes in Computer Science, vol 3367. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31838-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-31838-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25233-7
Online ISBN: 978-3-540-31838-5
eBook Packages: Computer ScienceComputer Science (R0)