Abstract
Efficient content location is a fundamental problem for decentralized peer-to-peer systems. Gnutella, a popular file-sharing application, relies on flooding queries to all peers. Although flooding is simple and robust, it is not scalable. In this chapter, we explore how to retain the simplicity of Gnutella while addressing its inherent weakness: scalability. We propose two complementary content location solutions that exploit locality to improve scalability. First, we look at temporal locality and find that the popularity of search strings follows a Zipf-like distribution. Caching query results to exploit temporal locality can significantly decrease the amount of traffic seen on the network by 3-times while using only a few megabytes of memory. As our second solution, we exploit a simple, yet powerful principle called interest-based locality, which posits that if a peer has a particular piece of content that one is interested in, it is very likely that it will have other items that one is interested in as well. We propose that peers loosely organize themselves into an interest-based structure on top of the existing Gnutella network. When using our algorithm, called interest-based shortcuts, a significant amount of flooding can be avoided, reducing the total load in the system by a factor of 3 to 7 and reducing the time to locate content to only one peer-to-peer hop. We demonstrate the existence of both types of locality and evaluate our solutions using traces of several different content distribution systems such as the Web and popular peer-to-peer file-sharing applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Almeida, V., Bestavros, A., Crovella, M., and de Oliveira, A. (1996). Characterizing Reference Locality in the WWW. In Proceedings of 1996 International Conference on Parallel and Distributed Information Systems (PDIS’96).
Bayardo, Jr., R., Somani, A., Gruhl, D., and Agrawal, R. (2002). YouServ: A Web Hosting and Content Sharing Tool for the Masses. In Proceedings of International WWW Conference.
BitTorrent (2005). Available at http://bitconjurer.org/BitTorrent.
Breslau, L., Cao, P., Fan, L., Phillips, G., and Shenker, S. (1999). Web Caching and Zipf-like Distributions: Evidence and Implications. In Proceedings of the IEEE INFOCOMM’ 99.
Chawathe, Y., Ratnasamy, S., Breslau, L., Lanham, N., and Shenker, S. (2003). Making Gnutella-like P2P Systems Scalable. In Proceedings of ACM Sigcomm.
Crespo, A. and Garcia-Molina, H. (2002). Routing Indices for Peer-to-Peer Systems. In Proceedings of the IEEE ICDCS.
Cunha, C., Bestavros, A., and Covella, M. (1995). Characteristics of WWW Client Based Traces. Technical Report BU-CS-95-010, Computer Science Department, Boston University.
GTK-Gnutella (2005). http://gtk-gnutella.sourceforge.net.
Harren, M., Hellerstein, J., Huebsch, R., Loo, B., Shenker, S., and Stoica, I. (2002). Complex Queries in DHT-based Peer-to-Peer Networks. In Proceedings of IPTPS.
Iyer, S., Rowstron, A., and Druschel, P. (2002). Squirrel: A Decentralized Peerto-Peer Web Cache. In ACM Symposium on Principles of Distributed Computing, PODC.
Jacobson, V., Leres, C., and McCanne, S. (2005). Tcpdump. Available at http://www.tcpdump.org/.
Kazaa (2005). http://www.kazaa.com.
6-draft.html.
Kroeger, T. M., Mogul, J. C., and Maltzahn, C. (1996). Digital’s web proxy traces. Available at ftp://ftp.digital.com/pub/DEC/traces/proxy/webtraces.html.
Kumar, A., Xu, J., and Zegura, E. (2005). Efficient and Scalable Query Routing for Unstructured Peer-to-Peer Networks. In Proceedings of IEEE Infocom.
Lv, Q., Cao, P., Li, K., and Shenker, S. (2002). Replication Strategies in Unstructured Peer-to-Peer Networks. In Proceedings of ACM International Conference on Supercomputing(ICS).
Meadows, J. (1999). Boeing proxy logs. Available at ftp://researchsmp2.cc.vt.edu/pub/boeing/.
server_survey.html.
Padmanabhan, V.N. and Sripanidkulchai, K. (2002). The Case for Cooperative Networking. In Proceedings of International Workshop on Peer-To-Peer Systems.
Plaxton, C., Rajaraman, R., and Richa, A. W. (1997). Accessing Nearby Copies of Replicated Objects in a Distributed Environment. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures.
Ratnasamy, S., Francis, P., Handley, M., Karp, R., and Shenker, S. (2001). A Scalable Content-Addressable Network. In Proceedings of ACM SIGCOMM.
Ratnasamy, S., Shenker, S., and Stoica, I. (2002). Routing Algorithms for DHTs: Some Open Questions. In Proceedings of International Peer-To-Peer Workshop.
Reynolds, Patrick and Vahdat, Amin (2003). Efficient Peer-to-Peer Keyword Searching. In Proceedings of the ACM/IFIP/USENIX Middleware Conference.
Ripeanu, M., Foster, I., and Iamnitchi, A. (2002). Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design. IEEE Internet Computing Journal, 6(1).
Rowstron, A. and Druschel, P. (2001). Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems. In IFIP/ACM International Conference on Distributed Systems Platforms (Middleware).
Saroiu, S., Gummadi, K. P., and Gribble, S. D. (2002). A Measurement Study of Peer-to-Peer File Sharing Systems. In Proceedings of Multimedia Computing and Networking (MMCN).
Sripanidkulchai, K. (2001). The Popularity of Gnutella Queries and Its Implications on Scalability. http://www.cs.cmu.edu/∼kunwadee/research/ p2p/gnutella.html.
Sripanidkulchai, K., Maggs, B., and Zhang, H. (2003). Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems. In Proceedings of IEEE Infocom.
Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. (2001). Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. In Proceedings of ACM SIGCOMM.
Tang, C., Xu, Z., and Dwarkadas, S. (2003). Peer-to-Peer Information Retrieval Using Self-Organizing Semantic Overlay Networks. In Proceedings of ACM Sigcomm.
Wolman, A., Voelker, G., Sharma, N., Cardwell, N., Karlin, A., and Levy, H. (1999). On the Scale and Performance of Cooperative Web Proxy Caching. In Proceedings of ACM SOSP.
Zhang, R. and Hu, Y. (2005). Assisted Peer-to-Peer Search with Partial Indexing. In Proceedings of IEEE Infocom.
Zhao, B., Kubiatowicz, J., and Joseph, A. (2000). Tapestry: An Infrastructure for Wide-area Fault-tolerant Location and Routing. U. C. Berkeley Technical Report UCB//CSD-01-1141.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer Science+Business Media, Inc.
About this chapter
Cite this chapter
Sripanidkulchai, K., Zhang, H. (2005). Content Location in Peer-to-Peer Systems: Exploiting Locality. In: Tang, X., Xu, J., Chanson, S.T. (eds) Web Content Delivery. Web Information Systems Engineering and Internet Technologies Book Series, vol 2. Springer, Boston, MA. https://doi.org/10.1007/0-387-27727-7_4
Download citation
DOI: https://doi.org/10.1007/0-387-27727-7_4
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24356-6
Online ISBN: 978-0-387-27727-1
eBook Packages: Computer ScienceComputer Science (R0)