Advertisement

RecFlow: SDN-based receiver-driven flow scheduling in datacenters

  • Aadil Zia KhanEmail author
  • Ihsan Ayyub Qazi
Article
  • 42 Downloads

Abstract

Datacenter applications (e.g., web search, recommendation systems, and social networking) are designed to have a high fanout for the purpose of achieving scalable performance. Frequent fabric congestion (e.g., due to incast, imperfect hashing) is a corollary of such a design. This is true even when the network utilization is low. Such fabric congestion exhibits both temporal as well as spatial (intra-rack and inter-rack) variations. There exist two basic design paradigms which are used to address this issue. Current solutions lie somewhere between the two. On one hand we have arbiter based approaches where senders poll a centralized arbiter and collectively obey global scheduling decisions. On the other end of the spectrum, we have self adjusting end point based approaches where senders independently adjust transmission rate based on network congestion. The former incurs greater overhead, compared to the latter which trades off complexity for sub-optimality. Our work seeks a middle ground - optimality of arbiter based approaches with the simplicity of self adjusting end point based approaches. Our key design principle is that since the receiver has complete information regarding the flows destined for it, rather than having a centralized arbiter schedule flows or the senders making independent scheduling decisions, the receiver can orchestrate the various flows destined for it. Since multiple receivers may be using a bottleneck link, datapath visibility should be used to ensure fair sharing of the bottleneck capacity between receivers with minimum overhead. We propose RecFlow, which is a receiver-based proactive congestion control scheme. RecFlow employs OpenFlow provided path visibility to track changing bottlenecks on the fly. It spaces TCP acknowledgements to prevent traffic bursts and ensure that no receiver exceeds its fair share of the bottleneck capacity. The goal is to reduce buffer overflows while maintaining fairness among flows and high link utilization. Using extensive simulation results and real testbed evaluation, we show that compared to the state-of-the-art, RecFlow achieves up to 6× improvement in the inter-rack scenario and 1.5× in the intra-rack scenario while sharing the link capacity fairly between all flows.

Keywords

Incast Flow scheduling Software defined networks Datacenters 

Notes

References

  1. 1.
    Abdelmoniem, A.M., Bensaou, B., Abu, A.J.: Sicc: Sdn-based incast congestion control for data centers. In: 2017 IEEE International Conference on Communications (ICC), pp. 1–6 (2017).  https://doi.org/10.1109/ICC.2017.7996826
  2. 2.
    Abts, D., Felderman, B.: A guided tour of data-center networking. Commun. ACM 55(6), 44–51 (2012).  https://doi.org/10.1145/2184319.2184335 CrossRefGoogle Scholar
  3. 3.
    Akidau, T., Balikov, A., Bekiroğlu, K., Chernyak, S., Haberman, J., Lax, R., McVeety, S., Mills, D., Nordstrom, P., Whittle, S.: Millwheel: fault-tolerant stream processing at internet scale. Proc. VLDB Endow. 6(11), 1033–1044 (2013).  https://doi.org/10.14778/2536222.2536229 CrossRefGoogle Scholar
  4. 4.
    Alizadeh, M., Edsall, T., Dharmapurikar, S., Vaidyanathan, R., Chu, K., Fingerhut, A., Lam, V.T., Matus, F., Pan, R., Yadav, N., Varghese, G.: Conga: distributed congestion-aware load balancing for datacenters, pp. 503–514. ACM, New York, NY, USA (2014).  https://doi.org/10.1145/2740070.2626316 Google Scholar
  5. 5.
    Alizadeh, M., Greenberg, A., Maltz, D.A., Padhye, J., Patel, P., Prabhakar, B., Sengupta, S., Sridharan, M.: Data center tcp (dctcp). In: Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM ’10, pp. 63–74. ACM, New York, NY, USA (2010).  https://doi.org/10.1145/1851182.1851192
  6. 6.
    Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pp. 265–278. USENIX Association, Berkeley, CA, USA (2010). http://dl.acm.org/citation.cfm?id=1924943.1924962
  7. 7.
    Bai, W., Chen, K., Wu, H., Lan, W., Zhao, Y.: PAC: Taming TCP incast congestion using proactive ACK control. In: Proceedings of the 2014 IEEE 22nd International Conference on Network Protocols, ICNP ’14, pp. 385–396. IEEE Computer Society, Washington, DC, USA (2014).  https://doi.org/10.1109/ICNP.2014.62
  8. 8.
    Bai, W., Chen, L., Chen, K., Wu, H.: Enabling ECN in multi-service multi-queue data centers. In: 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pp. 537–549. USENIX Association, Santa Clara, CA (2016). https://www.usenix.org/conference/nsdi16/technical-sessions/presentation/bai
  9. 9.
    Cheng, P., Ren, F., Shu, R., Lin, C.: Catch the whole lot in an action: rapid precise packet loss notification in data centers. In: Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI’14, pp. 17–28. USENIX Association, Berkeley, CA, USA (2014). http://dl.acm.org/citation.cfm?id=2616448.2616451
  10. 10.
    Dalton, M., Schultz, D., Adriaens, J., Arefin, A., Gupta, A., Fahs, B., Rubinstein, D., Zermeno, E.C., Rubow, E., Docauer, J.A., Alpert, J., Ai, J., Olson, J., DeCabooter, K., de Kruijf, M., Hua, N., Lewis, N., Kasinadhuni, N., Crepaldi, R., Krishnan, S., Venkata, S., Richter, Y., Naik, U., Vahdat, A.: Andromeda: performance, isolation, and velocity at scale in cloud network virtualization. In: 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pp. 373–387. USENIX Association, Renton, WA (2018). https://www.usenix.org/conference/nsdi18/presentation/dalton
  11. 11.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  12. 12.
    Emmerich, P., Raumer, D., Wohlfart, F., Carle, G.: Performance characteristics of virtual switching. In: 2014 IEEE 3rd International Conference on Cloud Networking (CloudNet), pp. 120–125 (2014).  https://doi.org/10.1109/CloudNet.2014.6968979
  13. 13.
    Facebook: Newsroom (2017). http://newsroom.fb.com/company-info
  14. 14.
    Ghobadi, M., Yeganeh, S.H., Ganjali, Y.: Rethinking end-to-end congestion control in software-defined networks. In: Proceedings of the 11th ACM Workshop on Hot Topics in Networks, HotNets-XI, pp. 61–66. ACM, New York, NY, USA (2012).  https://doi.org/10.1145/2390231.2390242
  15. 15.
    Hafeez, U.U., Kashaf, A., u. a. Bajwa, Q., Mushtaq, A., Zaidi, H., Qazi, I.A., Uzmi, Z.A.: Mitigating datacenter incast congestion using rto randomization. In: 2015 IEEE Global Communications Conference (GLOBECOM), pp. 1–6 (2015).  https://doi.org/10.1109/GLOCOM.2015.7417797
  16. 16.
    He, K., Rozner, E., Agarwal, K., Felter, W., Carter, J., Akella, A.: Presto: Edge-based load balancing for fast datacenter networks. In: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM ’15, pp. 465–478. ACM, New York, NY, USA (2015).  https://doi.org/10.1145/2785956.2787507
  17. 17.
    Hoff, T.: Latency is everywhere and it costs you sales how to crush it (2009). https://www.highscalability.com/blog/2009/7/25/latency-is-everywhere-and-it-costs-you-sales-how-to-crush-it.html
  18. 18.
    Hwang, J., Yoo, J., Choi, N.: Deadline and incast aware TCP for cloud data center networks. Comput. Netw. 68(Supplement C), 20–34 (2014).  https://doi.org/10.1016/j.comnet.2013.12.002 CrossRefGoogle Scholar
  19. 19.
    Jouet, S., Pezaros, D.P.: Measurement-based tcp parameter tuning in cloud data centers. In: 2013 21st IEEE International Conference on Network Protocols (ICNP), pp. 1–3 (2013).  https://doi.org/10.1109/ICNP.2013.6733644
  20. 20.
    Khan, A.Z., Qazi, I.A.: Receiver-driven flow scheduling for commodity datacenters. In: 2017 IEEE International Conference on Communications (ICC), pp. 1–6 (2017).  https://doi.org/10.1109/ICC.2017.7996676
  21. 21.
    Knowledge, D.C.: The facebook data center faq (2010). http://www.datacenterknowledge.com/the-facebook-data-center-faq-page-2/
  22. 22.
    Krevat, E., Vasudevan, V., Phanishayee, A., Andersen, D.G., Ganger, G.R., Gibson, G.A., Seshan, S.: On application-level approaches to avoiding tcp throughput collapse in cluster-based storage systems. In: Proceedings of the 2nd International Workshop on Petascale Data Storage: Held in Conjunction with Supercomputing ’07, PDSW ’07, pp. 1–4. ACM, New York, NY, USA (2007).  https://doi.org/10.1145/1374596.1374598
  23. 23.
    Kulkarni, S., Agrawal, P.: A probabilistic approach to address tcp incast in data center networks. In: 2011 31st International Conference on Distributed Computing Systems Workshops, pp. 26–33 (2011).  https://doi.org/10.1109/ICDCSW.2011.41
  24. 24.
    Lu, Y., Zhu, S.: SDN-based TCP congestion control in data center networks. In: Proceedings of the 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC), IPCCC ’15, pp. 1–7. IEEE Computer Society, Washington, DC, USA (2015).  https://doi.org/10.1109/PCCC.2015.7410275
  25. 25.
    Luo, T., Tan, H.P., Quan, P.C., Law, Y.W., Jin, J.: Enhancing responsiveness and scalability for openflow networks via control-message quenching. In: 2012 International Conference on ICT Convergence (ICTC), pp. 348–353 (2012).  https://doi.org/10.1109/ICTC.2012.6386857
  26. 26.
  27. 27.
    Nishtala, R., Fugal, H., Grimm, S., Kwiatkowski, M., Lee, H., Li, H.C., McElroy, R., Paleczny, M., Peek, D., Saab, P., Stafford, D., Tung, T., Venkataramani, V.: Scaling memcache at facebook. In: Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, nsdi’13, pp. 385–398. USENIX Association, Berkeley, CA, USA (2013). http://dl.acm.org/citation.cfm?id=2482626.2482663
  28. 28.
    ns-2 Network Simulator. https://www.isi.edu/nsnam/ns/
  29. 29.
  30. 30.
    Peng, D., Dabek, F.: Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pp. 251–264. USENIX Association, Berkeley, CA, USA (2010). http://dl.acm.org/citation.cfm?id=1924943.1924961
  31. 31.
    Perlin, M.: Downtime, outages and failures—understanding their true costs (2012). https://www.evolven.com/blog/downtime-outages-and-failures-understanding-their-true-costs.html
  32. 32.
    Perry, J., Ousterhout, A., Balakrishnan, H., Shah, D., Fugal, H.: Fastpass: A centralized “zero-queue” datacenter network. In: Proceedings of the 2014 ACM Conference on SIGCOMM, SIGCOMM ’14, pp. 307–318. ACM, New York, NY, USA (2014).  https://doi.org/10.1145/2619239.2626309
  33. 33.
    Pfaff, B., Pettit, J., Koponen, T., Jackson, E.J., Zhou, A., Rajahalme, J., Gross, J., Wang, A., Stringer, J., Shelar, P., Amidon, K., Casado, M.: The design and implementation of open vswitch. In: Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, NSDI’15, pp. 117–130. USENIX Association, Berkeley, CA, USA (2015). http://dl.acm.org/citation.cfm?id=2789770.2789779
  34. 34.
    Phanishayee, A., Krevat, E., Vasudevan, V., Andersen, D.G., Ganger, G.R., Gibson, G.A., Seshan, S.: Measurement and analysis of tcp throughput collapse in cluster-based storage systems. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies, FAST’08, pp. 12:1–12:14. USENIX Association, Berkeley, CA, USA (2008). http://dl.acm.org/citation.cfm?id=1364813.1364825
  35. 35.
    Pirzada, H.A., Mahboob, M.R., Qazi, I.A.: esdn: Rethinking datacenter transports using end-host sdn controllers. In: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM ’15, pp. 605–606. ACM, New York, NY, USA (2015).  https://doi.org/10.1145/2785956.2790022
  36. 36.
  37. 37.
    Rotsos, C., Sarrar, N., Uhlig, S., Sherwood, R., Moore, A.W.: Oflops: An open framework for openflow switch evaluation. In: Proceedings of the 13th International Conference on Passive and Active Measurement, PAM’12, pp. 85–95. Springer, Berlin (2012)Google Scholar
  38. 38.
    Roy, A., Zeng, H., Bagga, J., Porter, G., Snoeren, A.C.: Inside the social network’s (datacenter) network, pp. 123–137. ACM, New York (2015).  https://doi.org/10.1145/2829988.2787472 Google Scholar
  39. 39.
    Singh, A., Ong, J., Agarwal, A., Anderson, G., Armistead, A., Bannon, R., Boving, S., Desai, G., Felderman, B., Germano, P., Kanagala, A., Provost, J., Simmons, J., Tanda, E., Wanderer, J., Hölzle, U., Stuart, S., Vahdat, A.: Jupiter rising: a decade of clos topologies and centralized control in google’s datacenter network, pp. 183–197. ACM, New York, NY, USA (2015).  https://doi.org/10.1145/2829988.2787508 Google Scholar
  40. 40.
    Sreekumari, P., Jung, Ji, Lee, M.: A simple and efficient approach for reducing tcp timeouts due to lack of duplicate acknowledgments in data center networks. Clust. Comput. 19(2), 633–645 (2016).  https://doi.org/10.1007/s10586-016-0555-z CrossRefGoogle Scholar
  41. 41.
    Stats, I.L.: Google search statistics (2017). http://www.internetlivestats.com/google-search-statistics
  42. 42.
    The Open Networking Foundation: OpenFlow Switch Specification (2012)Google Scholar
  43. 43.
    The Open Networking Foundation: OpenFlow and SDN State of the Union (2016)Google Scholar
  44. 44.
    Tootoonchian, A., Gorbunov, S., Ganjali, Y., Casado, M., Sherwood, R.: On controller performance in software-defined networks. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services, Hot-ICE’12, pp. 10–10. USENIX Association, Berkeley, CA, USA (2012). http://dl.acm.org/citation.cfm?id=2228283.2228297
  45. 45.
    Vasudevan, V., Phanishayee, A., Shah, H., Krevat, E., Andersen, D.G., Ganger, G.R., Gibson, G.A., Mueller, B.: Safe and effective fine-grained tcp retransmissions for datacenter communication. In: Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication, SIGCOMM ’09, pp. 303–314. ACM, New York, NY, USA (2009).  https://doi.org/10.1145/1592568.1592604
  46. 46.
    Wu, H., Feng, Z., Guo, C., Zhang, Y.: Incast congestion control for TCP in data-center networks. IEEE/ACM Trans. Netw. 21, 345–358 (2013).  https://doi.org/10.1109/TNET.2012.2197411 CrossRefGoogle Scholar
  47. 47.
    Zhang, J., Ren, F., Lin, C.: Modeling and understanding TCP incast in data center networks. In: 2011 Proceedings IEEE INFOCOM, pp. 1377–1385 (2011).  https://doi.org/10.1109/INFCOM.2011.5934923
  48. 48.
    Zhang, J., Ren, F., Lin, C.: Survey on transport control in data center networks. IEEE Netw. 27(4), 22–26 (2013).  https://doi.org/10.1109/MNET.2013.6574661 CrossRefGoogle Scholar
  49. 49.
    Zhang, J., Ren, F., Tang, L., Lin, C.: Taming tcp incast throughput collapse in data center networks. In: 2013 21st IEEE International Conference on Network Protocols (ICNP), pp. 1–10 (2013).  https://doi.org/10.1109/ICNP.2013.6733609
  50. 50.
    Zheng, H., Chen, C., Qiao, C.: Understanding the impact of removing tcp binary exponential backoff in data centers. In: 2011 Third International Conference on Communications and Mobile Computing, pp. 174–177 (2011).  https://doi.org/10.1109/CMC.2011.85

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Lahore University of Management SciencesLahorePakistan
  2. 2.UC BerkeleyBerkeleyUSA

Personalised recommendations