Advertisement

Content Location in Peer-to-Peer Systems: Exploiting Locality

  • Kunwadee Sripanidkulchai
  • Hui Zhang
Chapter
Part of the Web Information Systems Engineering and Internet Technologies Book Series book series (WISE, volume 2)

Abstract

Efficient content location is a fundamental problem for decentralized peer-to-peer systems. Gnutella, a popular file-sharing application, relies on flooding queries to all peers. Although flooding is simple and robust, it is not scalable. In this chapter, we explore how to retain the simplicity of Gnutella while addressing its inherent weakness: scalability. We propose two complementary content location solutions that exploit locality to improve scalability. First, we look at temporal locality and find that the popularity of search strings follows a Zipf-like distribution. Caching query results to exploit temporal locality can significantly decrease the amount of traffic seen on the network by 3-times while using only a few megabytes of memory. As our second solution, we exploit a simple, yet powerful principle called interest-based locality, which posits that if a peer has a particular piece of content that one is interested in, it is very likely that it will have other items that one is interested in as well. We propose that peers loosely organize themselves into an interest-based structure on top of the existing Gnutella network. When using our algorithm, called interest-based shortcuts, a significant amount of flooding can be avoided, reducing the total load in the system by a factor of 3 to 7 and reducing the time to locate content to only one peer-to-peer hop. We demonstrate the existence of both types of locality and evaluate our solutions using traces of several different content distribution systems such as the Web and popular peer-to-peer file-sharing applications.

Keywords

Peer-to-peer file-sharing locality search content location 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Almeida, V., Bestavros, A., Crovella, M., and de Oliveira, A. (1996). Characterizing Reference Locality in the WWW. In Proceedings of 1996 International Conference on Parallel and Distributed Information Systems (PDIS’96).Google Scholar
  2. Bayardo, Jr., R., Somani, A., Gruhl, D., and Agrawal, R. (2002). YouServ: A Web Hosting and Content Sharing Tool for the Masses. In Proceedings of International WWW Conference.Google Scholar
  3. BitTorrent (2005). Available at http://bitconjurer.org/BitTorrent.Google Scholar
  4. Breslau, L., Cao, P., Fan, L., Phillips, G., and Shenker, S. (1999). Web Caching and Zipf-like Distributions: Evidence and Implications. In Proceedings of the IEEE INFOCOMM’ 99.Google Scholar
  5. Chawathe, Y., Ratnasamy, S., Breslau, L., Lanham, N., and Shenker, S. (2003). Making Gnutella-like P2P Systems Scalable. In Proceedings of ACM Sigcomm.Google Scholar
  6. Crespo, A. and Garcia-Molina, H. (2002). Routing Indices for Peer-to-Peer Systems. In Proceedings of the IEEE ICDCS.Google Scholar
  7. Cunha, C., Bestavros, A., and Covella, M. (1995). Characteristics of WWW Client Based Traces. Technical Report BU-CS-95-010, Computer Science Department, Boston University.Google Scholar
  8. GTK-Gnutella (2005). http://gtk-gnutella.sourceforge.net.Google Scholar
  9. Harren, M., Hellerstein, J., Huebsch, R., Loo, B., Shenker, S., and Stoica, I. (2002). Complex Queries in DHT-based Peer-to-Peer Networks. In Proceedings of IPTPS.Google Scholar
  10. Iyer, S., Rowstron, A., and Druschel, P. (2002). Squirrel: A Decentralized Peerto-Peer Web Cache. In ACM Symposium on Principles of Distributed Computing, PODC.Google Scholar
  11. Jacobson, V., Leres, C., and McCanne, S. (2005). Tcpdump. Available at http://www.tcpdump.org/.Google Scholar
  12. Kazaa (2005). http://www.kazaa.com.Google Scholar
  13. 6-draft.html.Google Scholar
  14. Kroeger, T. M., Mogul, J. C., and Maltzahn, C. (1996). Digital’s web proxy traces. Available at ftp://ftp.digital.com/pub/DEC/traces/proxy/webtraces.html.Google Scholar
  15. Kumar, A., Xu, J., and Zegura, E. (2005). Efficient and Scalable Query Routing for Unstructured Peer-to-Peer Networks. In Proceedings of IEEE Infocom.Google Scholar
  16. Lv, Q., Cao, P., Li, K., and Shenker, S. (2002). Replication Strategies in Unstructured Peer-to-Peer Networks. In Proceedings of ACM International Conference on Supercomputing(ICS).Google Scholar
  17. Meadows, J. (1999). Boeing proxy logs. Available at ftp://researchsmp2.cc.vt.edu/pub/boeing/.Google Scholar
  18. server_survey.html.Google Scholar
  19. Padmanabhan, V.N. and Sripanidkulchai, K. (2002). The Case for Cooperative Networking. In Proceedings of International Workshop on Peer-To-Peer Systems.Google Scholar
  20. Plaxton, C., Rajaraman, R., and Richa, A. W. (1997). Accessing Nearby Copies of Replicated Objects in a Distributed Environment. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures.Google Scholar
  21. Ratnasamy, S., Francis, P., Handley, M., Karp, R., and Shenker, S. (2001). A Scalable Content-Addressable Network. In Proceedings of ACM SIGCOMM.Google Scholar
  22. Ratnasamy, S., Shenker, S., and Stoica, I. (2002). Routing Algorithms for DHTs: Some Open Questions. In Proceedings of International Peer-To-Peer Workshop.Google Scholar
  23. Reynolds, Patrick and Vahdat, Amin (2003). Efficient Peer-to-Peer Keyword Searching. In Proceedings of the ACM/IFIP/USENIX Middleware Conference.Google Scholar
  24. Ripeanu, M., Foster, I., and Iamnitchi, A. (2002). Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design. IEEE Internet Computing Journal, 6(1).Google Scholar
  25. Rowstron, A. and Druschel, P. (2001). Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems. In IFIP/ACM International Conference on Distributed Systems Platforms (Middleware).Google Scholar
  26. Saroiu, S., Gummadi, K. P., and Gribble, S. D. (2002). A Measurement Study of Peer-to-Peer File Sharing Systems. In Proceedings of Multimedia Computing and Networking (MMCN).Google Scholar
  27. Sripanidkulchai, K. (2001). The Popularity of Gnutella Queries and Its Implications on Scalability. http://www.cs.cmu.edu/∼kunwadee/research/ p2p/gnutella.html.Google Scholar
  28. Sripanidkulchai, K., Maggs, B., and Zhang, H. (2003). Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems. In Proceedings of IEEE Infocom.Google Scholar
  29. Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. (2001). Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. In Proceedings of ACM SIGCOMM.Google Scholar
  30. Tang, C., Xu, Z., and Dwarkadas, S. (2003). Peer-to-Peer Information Retrieval Using Self-Organizing Semantic Overlay Networks. In Proceedings of ACM Sigcomm.Google Scholar
  31. Wolman, A., Voelker, G., Sharma, N., Cardwell, N., Karlin, A., and Levy, H. (1999). On the Scale and Performance of Cooperative Web Proxy Caching. In Proceedings of ACM SOSP.Google Scholar
  32. Zhang, R. and Hu, Y. (2005). Assisted Peer-to-Peer Search with Partial Indexing. In Proceedings of IEEE Infocom.Google Scholar
  33. Zhao, B., Kubiatowicz, J., and Joseph, A. (2000). Tapestry: An Infrastructure for Wide-area Fault-tolerant Location and Routing. U. C. Berkeley Technical Report UCB//CSD-01-1141.Google Scholar

Copyright information

© Springer Science+Business Media, Inc. 2005

Authors and Affiliations

  • Kunwadee Sripanidkulchai
    • 1
  • Hui Zhang
    • 1
  1. 1.Carnegie Mellon UniversityUSA

Personalised recommendations