Advertisement

Throughput Analytics of Data Transfer Infrastructures

  • Nageswara S. V. RaoEmail author
  • Qiang Liu
  • Zhengchun Liu
  • Rajkumar Kettimuthu
  • Ian Foster
Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 270)

Abstract

To support increasingly distributed scientific and big-data applications, powerful data transfer infrastructures are being built with dedicated networks and software frameworks customized to distributed file systems and data transfer nodes. The data transfer performance of such infrastructures critically depends on the combined choices of file, disk, and host systems as well as network protocols and file transfer software, all of which may vary across sites. The randomness of throughput measurements makes it challenging to assess the impact of these choices on the performance of infrastructure or its parts. We propose regression-based throughput profiles by aggregating measurements from sites of the infrastructure, with RTT as the independent variable. The peak values and convex-concave shape of a profile together determine the overall throughput performance of memory and file transfers, and its variations show the performance differences among the sites. We then present projection and difference operators, and coefficients of throughput profiles to characterize the performance of infrastructure and its parts, including sites and file transfer tools. In particular, the utilization-concavity coefficient provides a value in the range [0, 1] that reflects overall transfer effectiveness. We present results of measurements collected using (i) testbed experiments over dedicated 0–366 ms 10 Gbps connections with combinations of TCP versions, file systems, host systems and transfer tools, and (ii) Globus GridFTP transfers over production infrastructure with varying site configurations.

Keywords

Data transfer Infrastructure Throughput profile 

References

  1. 1.
    Iozone file system benchmark (2018). http://www.iozone.org. Accessed 28 Mar 2018
  2. 2.
    Energy Science Network Data Transfer Nodes. https://fasterdata.es.net/performance-testing/DTNs/. Accessed 28 Mar 2018
  3. 3.
    Allcock, W., et al.: The Globus striped GridFTP framework and server. In: ACM/IEEE Conference on Supercomputing, pp. 54–64. IEEE Computer Society, Washington, D.C. (2005)Google Scholar
  4. 4.
    Allen, B., et al.: Software as a service for data scientists. Commun. ACM 55(2), 81–88 (2012)CrossRefGoogle Scholar
  5. 5.
    Arslan, E., Kosar, T.: High speed transfer optimization based on historical analysis and real-time tuning. IEEE Trans. Parallel Distrib. Syst. 29, 1303–1316 (2018)CrossRefGoogle Scholar
  6. 6.
    Aspera Transfer Service. http://asperasoft.com. Accessed 28 Mar 2018
  7. 7.
    Cardwell, N., Cheng, Y., Gunn, C.S., Yeganeh, S.H., Jacobson, V.: BBR: congestion based congestion control. ACM Queue 14(5), 50 (2016)Google Scholar
  8. 8.
    Chard, K., Dart, E., Foster, I., Shifflett, D., Tuecke, S.J., Williams, J.: The modern research data portal: a design pattern for networked, data-intensive science. Peer J. Comput. Sci. 4(6), e144 (2018)CrossRefGoogle Scholar
  9. 9.
  10. 10.
    Gu, Y., Grossman, R.L.: UDT: UDP-based data transfer for high-speed wide area networks. Comput. Netw. 51(7), 1777–1799 (2007)CrossRefGoogle Scholar
  11. 11.
    Habib, S., Morozov, V., Frontiere, N., Finkel, H., Pope, A., Heitmann, K.: HACC: extreme scaling and performance across diverse architectures. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2013, pp. 6:1–6:10. ACM, New York (2013)Google Scholar
  12. 12.
    Hacker, T.J., Athey, B.D., Noble, B.: The end-to-end performance effects of parallel TCP sockets on a lossy wide-area network. In: 16th International Parallel and Distributed Processing Symposium (2002)Google Scholar
  13. 13.
    Henschel, R., et al.: Demonstrating Lustre over a 100 Gbps wide area network of 3,500 km. In: International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–8, November 2012Google Scholar
  14. 14.
    https://iperf.fr/. iPerf - the ultimate speed test tool for TCP, UDP and SCTPs (2018). https://iperf.fr/. Accessed 28 Mar 2018
  15. 15.
    Jain, S., et al.: B4: experience with a globally-deployed software defined WAN. SIGCOMM Comput. Commun. Rev. 43(4), 3–14 (2013)CrossRefGoogle Scholar
  16. 16.
    Kettimuthu, R., Liu, Z., Wheelerd, D., Foster, I., Heitmann, K., Cappello, F.: Transferring a petabyte in a day. In: 4th International Workshop on Innovating the Network for Data Intensive Science, p. 10, November 2017Google Scholar
  17. 17.
    Liu, Q., Rao, N.S.V.: On concavity and utilization analytics of wide-area network transport protocols. In: Proceedings of the 20th IEEE Conference on High Performance Computing and Communications (HPCC), Exeter, UK, June 2018Google Scholar
  18. 18.
    Liu, Q., Rao, N.S.V., Wu, C.Q., Yun, D., Kettimuthu, R., Foster, I.: Measurement-based performance profiles and dynamics of UDT over dedicated connections. In: International Conference on Network Protocols, Singapore, November 2016Google Scholar
  19. 19.
    Liu, Z., Balaprakash, P., Kettimuthu, R., Foster, I.: Explaining wide area data transfer performance. In: 26th International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2017, pp. 167–178. ACM, New York (2017)Google Scholar
  20. 20.
    Liu, Z., Kettimuthu, R., Foster, I., Beckman, P.H.: Towards a smart data transfer node. In: 4th International Workshop on Innovating the Network for Data Intensive Science, p. 10, November 2017CrossRefGoogle Scholar
  21. 21.
    Liu, Z., Kettimuthu, R., Leyffer, S., Palkar, P., Foster, I.: A mathematical programming - and simulation-based framework to evaluate cyberinfrastructure design choices. In: IEEE 13th International Conference on e-Science, p. 148–157, October 2017Google Scholar
  22. 22.
  23. 23.
    Mathis, M., Semke, J., Mahdavi, J., Ott, T.: The mascroscopic behavior of the TCP congestion avoidance algorithm. Comput. Commun. Rev. 27(3), 67–82 (1997)CrossRefGoogle Scholar
  24. 24.
    Matsunaga, H., Isobe, T., Mashimo, T., Sakamoto, H., Ueda, I.: Data transfer over the wide area network with a large round trip time. J. Phys.: Conf. Ser. 219(6), 062056 (2010)Google Scholar
  25. 25.
    Multi-core aware data transfer middleware. mdtm.fnal.gov. Accessed 28 Mar 2018
  26. 26.
    Michael, S., Zhen, L., Henschel, R., Simms, S., Barton, E., Link, M.: A study of Lustre networking over a 100 gigabit wide area network with 50 milliseconds of latency. In: 5th International Workshop on Data-Intensive Distributed Computing, pp. 43–52 (2012)Google Scholar
  27. 27.
    On-demand Secure Circuits and Advance Reservation System. http://www.es.net/oscars
  28. 28.
    Rao, N.S.V., Imam, N., Hanley, J., Sarp, O.: Wide-area Lustre file system using LNet routers. In: 12th Annual IEEE International Systems Conference (2018)Google Scholar
  29. 29.
    Rao, N.S.V., et al.: TCP throughput profiles using measurements over dedicated connections. In: ACM Symposium on High-Performance Parallel and Distributed Computing, Washington, D.C., July–August 2017Google Scholar
  30. 30.
    Rao, N.S.V., et al.: Experimental analysis of file transfer rates over wide-area dedicated connections. In: 18th IEEE International Conference on High Performance Computing and Communications (HPCC), Sydney, Australia, pp. 198–205, December 2016Google Scholar
  31. 31.
    Rao, N.S.V., et al.: Experiments and analyses of data transfers over wide-area dedicated connections. In: 26th International Conference on Computer Communications and Network (2017)Google Scholar
  32. 32.
    Rhee, I., Xu, L.: CUBIC: a new TCP-friendly high-speed TCP variant. In: 3rd International Workshop on Protocols for Fast Long-Distance Networks (2005)Google Scholar
  33. 33.
    Settlemyer, B.W., Dobson, J.D., Hodson, S.W., Kuehn, J.A., Poole, S.W., Ruwart, T.M.: A technique for moving large data sets over high-performance long distance networks. In: IEEE 27th Symposium on Mass Storage Systems and Technologies, pp. 1–6, May 2011Google Scholar
  34. 34.
    Shorten, R.N., Leith, D.J.: H-TCP: TCP for high-speed and long-distance networks. In: 3rd International Workshop on Protocols for Fast Long-Distance Networks (2004)Google Scholar
  35. 35.
    Srikant, Y., Ying, L.: Communication Networks: An Optimization, Control, and Stochastic Networks Perspective. Cambridge University Press, Cambridge (2014)zbMATHGoogle Scholar
  36. 36.
    XDD - The eXtreme dd toolset. https://github.com/bws/xdd. Accessed 28 Mar 2018
  37. 37.
  38. 38.
    Yildirim, E., Arslan, E., Kim, J., Kosar, T.: Application-level optimization of big data transfers through pipelining, parallelism and concurrency. IEEE Trans. Cloud Comput. 4(1), 63–75 (2016)CrossRefGoogle Scholar
  39. 39.
    Yildirim, E., Yin, D., Kosar, T.: Prediction of optimal parallelism level in wide area data transfers. IEEE Trans. Parallel Distrib. Syst. 22(12), 2033–2045 (2011)CrossRefGoogle Scholar

Copyright information

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2019

Authors and Affiliations

  • Nageswara S. V. Rao
    • 1
    Email author
  • Qiang Liu
    • 1
  • Zhengchun Liu
    • 2
  • Rajkumar Kettimuthu
    • 2
  • Ian Foster
    • 2
  1. 1.Oak Ridge National LaboratoryOak RidgeUSA
  2. 2.Argonne National LaboratoryArgonneUSA

Personalised recommendations