Abstract
Real-world measurements play an important role in understanding the characteristics and in improving the operation of BitTorrent, which is currently a popular Internet application. Much like measuring the Internet, the complexity and scale of the BitTorrent network make a single, complete measurement impractical. While a large number of measurements have already employed diverse sampling techniques to study parts of BitTorrent network, until now there exists no investigation of their sampling bias, that is, of their ability to objectively represent the characteristics of BitTorrent. In this work we present the first study of the sampling bias in BitTorrent measurements. We first introduce a novel taxonomy of sources of sampling bias in BitTorrent measurements. We then investigate the sampling among fifteen long-term BitTorrent measurements completed between 2004 and 2009, and find that different data sources and measurement techniques can lead to significantly different measurement results. Last, we formulate three recommendations to improve the design of future BitTorrent measurements, and estimate the cost of using these recommendations in practice.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Sen, S., Wang, J.: Analyzing peer-to-peer traffic across large networks. In: Proc. of ACM SIGCOMM IMW, pp. 137–150 (2002)
Izal, et al.: Dissecting BitTorrent: Five Months in a Torrent’s Lifetime. In: Proc. of PAM, Antibes Juan-les-Pins, France, pp. 1–11 (2004)
Pouwelse, J., Garbacki, P., Epema, D., Sips, H.: The BitTorrent P2P file-sharing system: Measurements and analysis. In: Castro, M., van Renesse, R. (eds.) IPTPS 2005. LNCS, vol. 3640, pp. 205–216. Springer, Heidelberg (2005)
Bhagwan, R., Savage, S., Voelker, G.M.: Understanding availability. In: IPTPS, pp. 256–267 (2003)
Gummadi, K., Dunn, R., Saroiu, S., Gribble, S., Levy, H., Zahorjan, J.: Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. In: ACM Symp. on Operating Systems Principles, SOSP (2003)
Handurukande, S.B., Kermarrec, A.M., Fessant, F.L., Massoulié, L., Patarin, S.: Peer sharing behaviour in the eDonkey network, and implications for the design of server-less file sharing systems. In: EuroSys, pp. 359–371 (2006)
Arlitt, M.F., Williamson, C.L.: Web server workload characterization: The search for invariants. In: SIGMETRICS, pp. 126–137 (1996)
Floyd, S., Paxson, V.: Difficulties in simulating the Internet. IEEE/ACM Trans. Netw. 9(4), 392–403 (2001)
Iosup, A., Garbacki, P., Pouwelse, J., Epema, D.: Correlating topology and path characteristics of overlay networks and the Internet. In: Proc. CCGrid, p. 10 (2006)
Andrade, N., Santos-Neto, E., Brasileiro, F.V., Ripeanu, M.: Resource demand and supply in bittorrent content-sharing communities. Computer Networks 53(4), 515–527 (2009)
ipoque GmbH: Internet studies (2006-2009), http://www.ipoque.com/resources/internet-studies/
Parker, A.: The True Picture of Peer-To-Peer File-Sharing. In: IEEE Int’l. W. on Web Content Caching and Distribution Panel (2005)
Zhang, B., Iosup, A., Garbacki, P., Pouwelse, J.: A unified format for traces of peer-to-peer systems. In: LSAP, pp. 27–34. ACM, New York (2009)
Lakhina, A., Byers, J.W., Crovella, M., Xie, P.: Sampling biases in ip topology measurements. In: INFOCOM (2003)
Lilliefors, H.W.: On the Kolmogorov-Smirnov test for normality with mean and variance unknown. J. Am. Stat. 62, 399–402 (1967)
Feitelson, D.G.: Workload modeling for performance evaluation. In: Performance, pp. 114–141 (2002)
Garbacki, P., Epema, D., van Steen, M.: Optimizing peer relationships in a super-peer network. In: ICDCS, p. 31 (2007)
Xie, S., Keung, G.Y., Li, B.: A measurement of a large-scale peer-to-peer live video streaming system. In: Proc. of ICPP, p. 57 (2007)
Zhang, B., Iosup, A., Pouwelse, J., Epema, D., Sips, H.: On assessing measurement accuracy in BitTorrent peer-to-peer file-sharing networks. Tech.Rep. PDS-2009-005, TU Delft (2009), http://pds.twi.tudelft.nl/reports/2009/PDS-2009-005.pdf
Zhang, B., Iosup, A., Epema, D.: The peer-to-peer trace archive: Design and comparative trace analysis. Technical Report PDS-2010-003, Delft University of Technology (2010), http://pds.twi.tudelft.nl/reports/2010/PDS-2010-003.pdf
Mol, J., Pouwelse, J., Epema, D., Sips, H.: Free-riding, fairness, and firewalls in p2p file-sharing. In: P2P, pp. 301–310 (2008)
Stutzbach, D., Rejaie, R., Sen, S.: Characterizing unstructured overlay topologies in modern P2P file-sharing systems. IEEE/ACM Trans. Netw. 16(2), 267–280 (2008)
Guo, L., Chen, S., Xiao, Z., Tan, E., Ding, X., Zhang, X.: Measurements, analysis, and modeling of bittorrent-like systems. In: Internet Measurment Conference, pp. 35–48 (2005)
Stutzbach, D., Rejaie, R., Duffield, N.G., Sen, S., Willinger, W.: On unbiased sampling for unstructured peer-to-peer networks. IEEE/ACM Trans. Netw. 17(2), 377–390 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, B., Iosup, A., Pouwelse, J., Epema, D., Sips, H. (2010). Sampling Bias in BitTorrent Measurements. In: D’Ambra, P., Guarracino, M., Talia, D. (eds) Euro-Par 2010 - Parallel Processing. Euro-Par 2010. Lecture Notes in Computer Science, vol 6271. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15277-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-15277-1_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15276-4
Online ISBN: 978-3-642-15277-1
eBook Packages: Computer ScienceComputer Science (R0)