Abstract
Simulation studies are frequently used to evaluate new peer-to-peer searching techniques as well as existing techniques on new applications. Unless these studies are accurate in their modeling of queries and documents, they may not reflect how search techniques will perform in real networks, leading to incorrect conclusions about which techniques are best. We describe how to model content so that simulations produce accurate results. We present a content model for peer-to-peer networks, which consists of a tripartite graph with edges connecting queries to the documents they match, and documents to the peers they are stored at. Our model also includes a set of statistics describing how often queries match the same documents, and how often similar documents are stored at the same peer. We can construct our tripartite content model by running queries over live data stored at real Internet nodes, and simulation results show that searching techniques do indeed perform differently in simulations using this “real” content model versus a randomly generated model. We then present an algorithm for using real content gathered from a small set of peers (say, 1,000) to generate a synthetic content model for large simulated networks (say, 10,000 nodes or more). Finally, we use a synthetic model generated from World Wide Web documents and queries to compare the performance of several search algorithms that have been reported in the literature.
Chapter PDF
References
Adamic, L., Lukose, R., Puniyani, A., Huberman, B.: Search in power-law networks. Phys. Rev. E 64, 46135–46143 (2001)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)
Bhattacharjee, B.: Efficient peer-to-peer searches using result-caching. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735, Springer, Heidelberg (2003)
Cahoon, B., McKinley, K.S., Lu, Z.: Evaluating the performance of distributed architectures for information retrieval using a variety of workloads. ACM Transactions on Information Systems 18(1), 1–43 (2000)
Carzaniga, A., Wolf, A.L.: Forwarding in a content-based network. In: Proc. SIGCOMM (2003)
Chawathe, Y., Ratnasamy, S., Breslau, L., Lanham, N., Shenker, S.: Making Gnutella-like P2P systems scalable. In: Proc. ACM SIGCOMM (2003)
Cohen, E., Shenker, S.: Replication strategies in unstructured peer-to-peer networks. In: Proc. SIGCOMM (August 2002)
Crespo, A., Garcia-Molina, H.: Routing indices for peer-to-peer systems. In: Proc. Int’l Conf. on Distributed Computing Systems (ICDCS) (July 2002)
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: Proc. SIGCOMM (1999)
Ge, Z., Figueiredo, D.R., Jaiswal, S., Kurose, J., Towsley, D.: Modeling peer-peer file sharing systems. In: Proc. INFOCOM (2003)
Gummadi, K.P., Dunn, R.J., Saroiu, S., Gribble, S.D., Levy, H.M., Zahorjan, J.: Measurement, modeling and analysis of a peer-to-peer file-sharing workload. In: Proc. SOSP (2003)
Kalogeraki, V., Gunopulos, D., Zeinalipour-Yazti, D.: A local search mechanism for peerto- peer networks. In: Proc. CIKM (2002)
Khambatti, M., Ryu, K., Dasgupta, P.: Structuring peer-to-peer networks using interestbased communities. In: Proc. International Workshop on Databases, Information Systems and Peer-to-Peer Computing (2003)
Li, J., Loo, B.T., Hellerstein, J.M., Kaashoek, M.F., Karger, D.R., Morris, R.: On the feasibility of peer-to-peer web indexing and search. In: Kaashoek, M.F., Stoica, I. (eds.) IPTPS 2003. LNCS, vol. 2735, Springer, Heidelberg (2003)
Loser, F., Naumann, W., Siberski, W.: Nejdl, and U. Thaden. Semantic overlay clusters within peer-to-peer networks. In: Proc. International Workshop on Databases, Information Systems and Peer-to-Peer Computing (2003)
Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: Proc. of ACM Int’l Conf. on Supercomputing, ICS 2002 (2002)
Lv, Q., Ratnasamy, S., Shenker, S.: Can heterogeneity make Gnutella scalable? In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, p. 94. Springer, Heidelberg (2002)
Nejdl, W., Wolpers, M., Siberski, W., Schmitz, C., Schlosser, M., Brunkhorst, I., Loser, A.: Super-peer-based routing and clustering strategies for RDF-based peer-to-peer networks. In: Proc. WWW (2003)
Palmer, C., Steffan, J.: Generating network topologies that obey power laws. In: Proc. of GLOBECOM 2000 (November 2000)
Ratnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S.: A scalable contentaddressable network. In: Proc. SIGCOMM (August 2001)
Ripeanu, M., Foster, I.: Mapping the Gnutella network: Macroscopic properties of largescale peer-to-peer systems. In: Druschel, P., Kaashoek, M.F., Rowstron, A. (eds.) IPTPS 2002. LNCS, vol. 2429, p. 85. Springer, Heidelberg (2002)
Saroiu, S., Gummadi, K., Gribble, S.: A measurement study of peer-to-peer file sharing systems. In: Proc. Multimedia Conferencing and Networking (January 2002)
Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. In: Proc. SIGCOMM (August 2001)
Tang, C., Xu, Z., Dwarkadas, S.: Peer-to-peer information retrieval using self-organizing semantic overlay networks. In: Proc. SIGCOMM (2003)
Tangmunarunkit, H., Govindan, R., Jamin, S., Shenker, S.: andW.Willinger. Network topology generators: Degree-based vs. structural. In: Proc. SIGCOMM (August 2002)
Yang, B., Garcia-Molina, H.: Efficient search in peer-to-peer networks. In: Proc. Int’l Conf. on Distributed Computing Systems, ICDCS (July 2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 IFIP International Federation for Information Processing
About this paper
Cite this paper
Cooper, B.F. (2004). A Content Model for Evaluating Peer-to-Peer Searching Techniques. In: Jacobsen, HA. (eds) Middleware 2004. Middleware 2004. Lecture Notes in Computer Science, vol 3231. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30229-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-30229-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23428-9
Online ISBN: 978-3-540-30229-2
eBook Packages: Springer Book Archive