Skip to main content
Log in

Bridging machine learning and computer network research: a survey

  • Review Paper
  • Published:
CCF Transactions on Networking

Abstract

With the booming development of artificial intelligence (AI), a series of relevant applications are emerging and promoting an all-rounded reform of the industry. As the major technology of AI, machine learning (ML) shows great potential in solving network challenges. Network optimization, in return, brings significant performance gains for ML applications, in particular distributed machine learning. In this paper, we conduct a survey on combining ML technologies with network research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In this paper, we use node and machine as synonyms.

References

  • AMD.: Accelerators for High Performance Compute. http://www.amd.com/en-us/solutions/professional/hpc (2017)

  • Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D.G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., Zheng, X.: Tensorflow: A system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). pp. 265–283. USENIX Association, GA (2016)

  • Antonakakis, M., Perdisci, R., Nadji, Y., Vasiloglou, N., Abu-Nimeh, S., Lee, W., Dagon, D.: From throw-away traffic to bots: Detecting the rise of dga-based malware. In: Presented as part of the 21st USENIX Security Symposium (USENIX Security 12). pp. 491–506. USENIX, Bellevue, WA (2012)

  • Archer, C., Blocksome, M.: Remote direct memory access https://www.google.com/patents/US8325633. US Patent 8,325,633 (2012)

  • Ashfaq, A.B., Javed, M., Khayam, S.A., Radha, H.: An information-theoretic combining method for multi-classifier anomaly detection systems. In: 2010 IEEE International Conference on Communications. pp. 1–5 (2010)

  • Ballani, H., Costa, P., Karagiannis, T., Rowstron, A.: Towards predictable datacenter networks, pp. 242–253. SIGCOMM., ’11, ACM, New York, NY, USA (2011)

  • Bao, Y., Wu, H., Liu, X.: From prediction to action:a closed-loop approach for data-guided network resource allocation. In: In Proceedings of the SigKDD ’16 Conference. pp. 1425–1434 (2016)

  • Baralis, E.M., Mellia, M., Grimaudo, L.: Self-learning classifier for internet traffic (2013)

  • Bartos, K., Sofka, M., Franc, V.: Optimized invariant representation of network traffic for detecting unseen malware variants. In: 25th USENIX Security Symposium (USENIX Security 16). pp. 807–822. USENIX Association, Austin, TX (2016)

  • Bartos, K., Sofka, M., Franc, V.: Optimized invariant representation of network traffic for detecting unseen malware variants. In: USENIX Security Symposium. pp. 807–822 (2016)

  • Borgolte, K., Kruegel, C., Vigna, G.: Meerkat: Detecting website defacements through image-based object recognition. In: 24th USENIX Security Symposium (USENIX Security 15). pp. 595–610. USENIX Association, Washington, DC (2015)

  • Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 39–48 (2016)

  • Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization methods for machine learning. Math. Program. 134(1), 127–155 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., Zhang, Z.: Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR abs/1512.01274 (2015)

  • Chilimbi, T., Suzue, Y., Apacible, J., Kalyanaraman, K.: Project adam: Building an efficient and scalable deep learning training system. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). pp. 571–582. USENIX Association, Broomfield, CO (2014)

  • Chowdhury, M., Stoica, I.: Efficient coflow scheduling without prior knowledge, pp. 393–406 (2015)

  • Comar, P.M., Liu, L., Saha, S., Tan, P.N., Nucci, A.: Combining supervised and unsupervised learning for zero-day malware detection. In: 2013 Proceedings IEEE INFOCOM, pp. 2022–2030 (2013)

  • Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better mini-batch algorithms via accelerated gradient methods. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24, pp. 1647–1655. Curran Associates, Inc. (2011)

  • Das, A.K., Pathak, P.H., Chuah, C.N., Mohapatra, P.: Contextual localization through network traffic analysis. In: INFOCOM, 2014 Proceedings IEEE, pp. 925–933. IEEE (2014)

  • Dean, J., Corrado, G.S., Monga, R., Chen, K., Devin, M., Le, Q.V., Mao, M.Z., Ranzato, M., Senior, A., Tucker, P., Yang, K., Ng, A.Y.: Large scale distributed deep networks, pp. 1223–1231. Associates Inc., USA, NIPS’12, Curran (2012)

  • Dong, M., Li, Q., Zarchy, D., Godfrey, P.B., Schapira, M.: Pcc: re-architecting congestion control for consistent high performance. NSDI 1, 2 (2015)

    Google Scholar 

  • Fontugne, R., Borgnat, P., Abry, P., Fukuda, K.: Mawilab:combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking. In: International Conference of CoNext, pp. 1–12 (2010)

  • Foundation, T.A.S.: Hadoop project.http://hadoop.apache.org/core/ (2009)

  • Franc, V., Sofka, M., Bartos, K.: Learning detector of malicious network traffic from weak labels. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 85–99. Springer (2015)

  • Franc, V., Sofka, M., Bartos, K.: Learning detector of malicious network traffic from weak labels. In: Proceedings, Part III, of the European Conference on Machine Learning and Knowledge Discovery in Databases —Volume 9286. pp. 85–99. ECML PKDD 2015, Springer, New York, Inc., New York, NY, USA (2015)

  • Furno, A., Fiore, M., Stanica, R.: Joint spatial and temporal classification of mobile traffic demands. In: INFOCOM—36th Annual IEEE International Conference on Computer Communications (2017)

  • Guo, C., Lu, G., Li, D., Wu, H., Zhang, X., Shi, Y., Tian, C., Zhang, Y., Lu, S.: Bcube:a high performance, server-centric network architecture for modular data centers, pp. 63–74 (2009)

  • Hayes, J., Danezis, G.: k-fingerprinting: a robust scalable website fingerprinting technique. In: 25th USENIX Security Symposium (USENIX Security 16), pp. 1187–1203. USENIX Association, Austin, TX (2016)

  • He, T., Goeckel, D., Raghavendra, R., Towsley, D.: Endhost-based shortest path routing in dynamic networks: An online learning approach. In: INFOCOM, 2013 Proceedings IEEE, pp. 2202–2210 (2013)

  • Hsieh, K., Harlap, A., Vijaykumar, N., Konomis, D., Ganger, G.R., Gibbons, P.B., Mutlu, O.: Gaia: Geo-distributed machine learning approaching LAN speeds. In: 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pp. 629–647. USENIX Association, Boston, MA (2017)

  • Jayaraj, A., Venkatesh, T., Murthy, C.S.R.: Loss classification in optical burst switching networks using machine learning techniques: improving the performance of TCP. IEEE J. Sel. Areas Commun. 26(6), 45–54 (2008)

    Article  Google Scholar 

  • Jia, C., Liu, J., Jin, X., Lin, H., An, H., Han, W., Wu, Z., Chi, M.: Improving the performance of distributed tensorflow with RDMA. Int. J. Parallel Program. 3, 1–12 (2017)

    Google Scholar 

  • Jiang, J., Sekar, V., Milner, H., Shepherd, D., Stoica, I., Zhang, H.: CFA: A practical prediction system for video QoE optimization. In: NSDI, pp. 137–150 (2016)

  • Li, D., Chen, C., Guan, J., Zhang, Y., Zhu, J., Yu, R.: Dcloud: Deadline-aware resource allocation for cloud computing jobs. IEEE Trans. Parallel Distrib. Syst. 27(8), 2248–2260 (2016)

    Article  Google Scholar 

  • Li, M., Andersen, D.G., Park, J.W., Smola, A.J., Ahmed, A., Josifovski, V., Long, J., Shekita, E.J., Su, B.Y.: Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, pp. 583–598. OSDI’14, USENIX Association, Berkeley, CA, USA (2014a)

  • Li, M., Andersen, D.G., Smola, A.J., Yu, K.: Communication efficient distributed machine learning with the parameter server. In: International conference on neural information processing systems, MIT Press, Cambridge, pp. 19–27 (2014b)

  • Li, M., Zhang, T., Chen, Y., Smola, A.J.: Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 661–670. ACM (2014c)

  • Li, W., Zhou, F., Meleis, W., Chowdhury, K.: Learning-based and data-driven tcp design for memory-constrained iot. In: Distributed Computing in Sensor Systems, pp. 199–205. IEEE (2016)

  • Li, X., Bian, F., Crovella, M., Diot, C., Govindan, R., Iannaccone, G., Lakhina, A.: Detection and identification of network anomalies using sketch subspaces. In: ACM SIGCOMM Conference on Internet Measurement, pp. 147–152 (2006)

  • Lian, X., Zhang, C., Zhang, H., Hsieh, C.J., Zhang, W., Liu, J.: Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent (2017)

  • Liu, D., Zhao, Y., Sui, K., Zou, L., Pei, D., Tao, Q., Chen, X., Tan, D.: Focus: Shedding light on the high search response time in the wild. In: IEEE INFOCOM 2016—the IEEE International Conference on Computer Communications, pp. 1–9 (2016)

  • Liu, D., Zhao, Y., Xu, H., Sun, Y., Pei, D., Luo, J., Jing, X., Feng, M.: Opprentice: towards practical and automatic anomaly detection through machine learning, Tokyo, Japan (2015)

  • Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: a new framework for parallel machine learning. CoRR abs/1408.2041 (2014)

  • Lutu, A., Bagnulo, M., Cid-Sueiro, J., Maennel, O.: Separating wheat from chaff: Winnowing unintended prefixes using machine learning. In: IEEE INFOCOM 2014—IEEE Conference on Computer Communications, pp. 943–951 (2014)

  • Ma, S., Jiang, J., Li, B., Li, B.: Maximizing container-based network isolation in parallel computing clusters. In: Edition of the IEEE International Conference on Network Protocols, pp. 1–10 (2016)

  • Mao, H., Alizadeh, M., Menache, I., Kandula, S.: Resource management with deep reinforcement learning, pp. 50–56., HotNets ’16, ACM, New York, NY, USA (2016)

  • Mao, H., Netravali, R., Alizadeh, M.: Neural adaptive video streaming with pensieve, pp. 197–210. ACM (2017)

  • Mirza, M., Sommers, J., Barford, P., Zhu, X.: A machine learning approach to tcp throughput prediction. In: ACM SIGMETRICS Performance Evaluation Review, vol. 35, pp. 97–108. ACM (2007)

  • NVIDIA.: GPU APPLICATIONS: transforming computational research and engineering. http://www.nvidia.com/object/machine-learning.html (2017)

  • NVIDIA.: Developing a linux kernel module using gpudirect rdma. https://docs.nvidia.com/cuda/gpudirect-rdma/index.html (2018)

  • NVIDIA.: Nvlink fabric. https://www.nvidia.com/en-us/data-center/nvlink/ (2018)

  • NVIDIA: Nvidia dgx-1: the fastest deep learning system.https://devblogs.nvidia.com/parallelforall/dgx-1-fastest-deep-learning-system/ (2017)

  • Nandi, A., Mandal, A., Atreja, S., Dasgupta, G.B., Bhattacharya, S.: Anomaly detection using program control flow graph mining from execution logs, pp. 215–224., KDD ’16, ACM, New York, NY, USA (2016)

  • Neuvirth, H., Finkelstein, Y., Hilbuch, A., Nahum, S., Alon, D., Yom-Tov, E.: Early detection of fraud storms in the cloud. In: Proceedings, Part III, of the European Conference on Machine Learning and Knowledge Discovery in Databases —volume 9286, pp. 53–67. ECML PKDD 2015, Springer-Verlag New York, Inc., New York, NY, USA (2015)

  • Nunes, B.A., Veenstra, K., Ballenthin, W., Lukin, S., Obraczka, : K.: A machine learning approach to end-to-end rtt estimation and its application to tcp, pp. 1–6. IEEE (2011)

  • Research., B.: Bringing HPC techniques to deep learning. http://research.baidu.com/bringing-hpc-techniques-deep-learning/ (2017)

  • Santiago del Rio, P.M., Rossi, D., Gringoli, F., Nava, L., Salgarelli, L., Aracil, J.: Wire-speed statistical classification of network traffic on commodity hardware, pp. 65–72. ACM (2012)

  • Sivaraman, A., Winstein, K., Thaker, P., Balakrishnan, H.: An experimental study of the learnability of congestion control. In: ACM SIGCOMM Computer Communication Review, vol. 44, pp. 479–490. ACM (2014)

  • Soska, K., Christin, N.: Automatically detecting vulnerable websites before they turn malicious. In: 23rd USENIX Security Symposium (USENIX Security 14), pp. 625–640, USENIX Association, San Diego, CA (2014)

  • Soule, A., Taft, N.: Combining filtering and statistical methods for anomaly detection. In: Conference on Internet Measurement 2005, Berkeley, California, Usa, pp. 31–31 (2005)

  • Stringhini, G., Kruegel, C., Vigna, G.: Shady paths: Leveraging surfing crowds to detect malicious web pages, pp. 133–144., CCS ’13, ACM, New York, NY, USA (2013)

  • Sun, Y., Yin, X., Jiang, J., Sekar, V., Lin, F., Wang, N., Liu, T., Sinopoli, B.: Cs2p: Improving video bitrate selection and adaptation with data-driven throughput prediction. In: Proceedings of the 2016 conference on ACM SIGCOMM 2016 Conference, pp. 272–285, ACM (2016)

  • Tan, H., Han, Z., Li, X., Lau, F.C.M.: Online job dispatching and scheduling in edge-clouds (2017)

  • Taylor, V.F., Spolaor, R., Conti, M., Martinovic, I.: Appscanner: Automatic fingerprinting of smartphone apps from encrypted network traffic. In: 2016 IEEE European Symposium on Security and Privacy (EuroS P). pp. 439–454 (March 2016)

  • Wang, G., Wang, T., Zheng, H., Zhao, B.Y.: Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers. In: 23rd USENIX Security Symposium (USENIX Security 14), pp. 239–254. USENIX Association, San Diego, CA (2014)

  • Wang, W., Zhang, Q.: A stochastic game for privacy preserving context sensing on mobile phone. In: IEEE INFOCOM 2014—IEEE Conference on Computer Communications, pp. 2328–2336 (2014)

  • Wang, Z.: The applications of deep learning on traffic identification. BlackHat USA (2015)

  • Wei, J., Dai, W., Qiao, A., Ho, Q., Cui, H., Ganger, G.R., Gibbons, P.B., Gibson, G.A., Xing, E.P.: Managed communication and consistency for fast data-parallel iterative analytics, pp. 381–394. ACM (2015)

  • Winstein, K., Balakrishnan, H.: Tcp ex machina: computer-generated congestion control. In: ACM SIGCOMM Computer Communication Review, vol. 43, pp. 123–134. ACM (2013)

  • Xiao, W., Xue, J., Miao, Y., Li, Z., Chen, C., Wu, M., Li, W., Zhou, L.: Tux2: Distributed graph computation for machine learning. In: 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pp. 669–682. USENIX Association, Boston, MA (2017)

  • Xie, D., Ding, N., Hu, Y.C., Kompella, R.: The only constant is change: incorporating time-varying network reservations in data centers. ACM Sigcomm. Comput. Commun. Rev. 42(4), 199–210 (2012)

    Article  Google Scholar 

  • Xing, E.P., Ho, Q., Dai, W., Kim, J.K., Wei, J., Lee, S., Zheng, X., Xie, P., Kumar, A., Yu, Y.: Petuum: a new platform for distributed machine learning on big data. IEEE Trans. Big Data 1(2), 49–67 (2015)

    Article  Google Scholar 

  • Xu, Q., Liao, Y., Miskovic, S., Mao, Z.M., Baldi, M., Nucci, A., Andrews, T.: Automatic generation of mobile app signatures from traffic observations. In: 2015 IEEE Conference on Computer Communications (INFOCOM), pp. 1481–1489 (2015)

  • Xu, Y., Yao, J., Jacobsen, H.A., Guan, H.: Cost-efficient negotiation over multiple resources with reinforcement learning. Spain, Barcelona (2016)

  • Yamada, M., Kimura, A., Naya, F., Sawada, H.: Change-point detection with feature selection in high-dimensional time-series data. J. Catalysis 111(1), 50–58 (2013)

    Google Scholar 

  • Yi, B., Xia, J., Chen, L., Chen, K.: Towards zero copy dataflows using rdma. In: Proceedings of the SIGCOMM Posters and Demos, vol. 2017. ACM (2017)

  • Zadrozny, B.: Learning and evaluating classifiers under sample selection bias, pp. 114, ICML ’04, ACM, New York, NY, USA (2004)

  • Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing. pp. 10–10. HotCloud’10, USENIX Association, Berkeley, CA, USA (2010)

  • Zhang, H., Zheng, Z., Xu, S., Dai, W., Ho, Q., Liang, X., Hu, Z., Wei, J., Xie, P., Xing, E.P.: Poseidon: An efficient communication architecture for distributed deep learning on GPU clusters. In: 2017 USENIX Annual Technical Conference (USENIX ATC 17), pp. 181–193. USENIX Association, Santa Clara, CA (2017)

  • Zhang, R., Qi, W., Wang, J.: Cross-vm covert channel risk assessment for cloud computing: An automated capacity profiler. In: 2014 IEEE 22nd International Conference on Network Protocols, pp. 25–36 (2014)

  • Zhang, X., Wu, C., Li, Z., Lau, F.C.M.: Proactive vnf provisioning with multi-timescale cloud resources: fusing online learning and online optimization. In: IEEE INFOCOM 2017-IEEE Conference on Computer Communications (INFOCOM), pp. 1–9. IEEE (2017)

  • Zhang, Z., Zhang, Z., Lee, P.P., Liu, Y., Xie, G.: Proword: an unsupervised approach to protocol feature word extraction. In: INFOCOM, 2014 Proceedings IEEE, pp. 1393–1401. IEEE (2014)

  • Zheng, R., Le, T., Han, Z.: Approximate online learning for passive monitoring of multi-channel wireless networks. Proc. IEEE INFOCOM 12(11), 3111–3119 (2013)

    Google Scholar 

  • Zheng, N., Bai, K., Huang, H., Wang, H.: You are how you touch: User verification on smartphones via tapping behaviors. In: 2014 IEEE 22nd International Conference on Network Protocols, pp. 221–232 (2014)

  • Zhou, X., Wang, K., Jia, W., Guo, M.: Reinforcement learning-based adaptive resource management of differentiated services in geo-distributed data centers. Spain, Barcelona (2016)

  • Zhu, J., Li, D., Wu, J., Liu, H., Zhang, Y., Zhang, J.: Towards bandwidth guarantee in multi-tenancy cloud computing networks. In: IEEE International Conference on Network Protocols, pp. 1–10 (2012)

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grants No. 61772305.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, Y., Geng, J., Wang, Y. et al. Bridging machine learning and computer network research: a survey. CCF Trans. Netw. 1, 1–15 (2019). https://doi.org/10.1007/s42045-018-0009-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42045-018-0009-7

Keywords

Navigation