Annals of Telecommunications

, Volume 74, Issue 7–8, pp 473–482 | Cite as

Statistical network protocol identification with unknown pattern extraction

  • Yu WangEmail author
  • Hanxiao Xue
  • Yang Liu
  • Waixi Liu


Network traffic classification is an enabling technique for network security and management for both traditional networks and emerging networks such as Internet of Things. Due to the decreasing effectiveness of traditional port-based and payload-based methods, lots of research attentions are devoted to an alternative approach based on flow and packet-level traffic characteristics. A variety of statistical classification schemes are proposed in this context, but most of them embody an implicit assumption that all protocols are known in advance and well presented in the training data. This assumption is unrealistic because real-world networks constantly witness emerging traffic patterns and protocols that are previously unknown. In this paper, we revisit the problem by proposing a learning scheme with unknown pattern extraction for statistical protocol identification. The scheme is designed with a more realistic setting, in which we assume that the training data only consists of labeled samples from a limited number of protocols, and the goal is to identify these known patterns out of arbitrary traffic mixture of both known and unknown protocols. Our experiments based on real-world traffic show that the proposed scheme outperforms previous approaches by accurately identifying both known and unknown protocols.


Network security Traffic classification Machine learning Constrained clustering 



The work is supported by NSFC Project 61802080 and 61872102.


  1. 1.
    Nguyen TT, Armitage G (2008) A survey of techniques for internet traffic classification using machine learning. Commun Surveys Tuts 10(4):56CrossRefGoogle Scholar
  2. 2.
    Liu Q, Wang G, Liu X, Peng T, Wu J (2017) Achieving reliable and secure services in cloud computing environments. Comput Electr Eng 59:153CrossRefGoogle Scholar
  3. 3.
    Meng W, Tischhauser EW, Wang Q, Wang Y, Han J (2018) When intrusion detection meets blockchain technology: a review. IEEE Access 6:10179CrossRefGoogle Scholar
  4. 4.
    Karagiannis T, Broido A, Brownlee N, Claffy KC, Faloutsos M (2004) In: Global telecommunications conference GLOBECOM ’04. IEEE, vol 3. pp 1532–1538Google Scholar
  5. 5.
    Sen S, Spatscheck O, Wang D (2004) In: Proceedings of the 13th international conference on World Wide Web, WWW ’04, ACM, New York, pp 512–521Google Scholar
  6. 6.
    Mawi working group traffic archive. Accessed: 2018-03-01
  7. 7.
    Meng W, Wang Y, Wong DS, Wen S, Xiang Y (2018) Touchwb: Touch behavioral user authentication based on web browsing on smartphones. J Netw Comput Appl 117:1CrossRefGoogle Scholar
  8. 8.
    Li J, Sun L, Yan Q, Li Z, Srisa-an W, Ye H (2018) Significant permission identification for machine learning based android malware detection. IEEE Transactions on Industrial Informatics.
  9. 9.
    Liu Y, Ling J, Liu Z, Shen J, Gao C (2018) Finger vein secure biometric template generation based on deep learning. Soft Comput 22(7):2257CrossRefGoogle Scholar
  10. 10.
    Meng W, Jiang L, Wang Y, Li J, Zhang J, Xiang Y (2017) Jfcguard: Detecting juice filming charging attack via processor usage analysis on smartphones. Computers & Security.
  11. 11.
    Yuan C, Li X, Wu Q, Li J, Sun X (2017) Fingerprint liveness detection from different fingerprint materials using convolutional neural network and principal component analysis. CMC-Computers Materials & Continua 53(4):357Google Scholar
  12. 12.
    Roughan M, Sen S, Spatscheck O, Duffield N (2004) In: Proceedings of the 4th ACM SIGCOMM conference on internet measurement, IMC ’04, ACM, New York, pp 135–148Google Scholar
  13. 13.
    Moore AW, Zuev D (2005) In: Proceedings of the ACM SIGMETRICS international conference on measurement and modeling of computer systems, SIGMETRICS ’05, ACM, New York, pp 50–60Google Scholar
  14. 14.
    Auld T, Moore AW, Gull SF (2007) Bayesian neural networks for internet traffic classification. IEEE Trans Neural Netw 18(1):223CrossRefGoogle Scholar
  15. 15.
    Chen Z, Peng L, Gao C, Yang B, Chen Y, Li J (2017) Flexible neural trees based early stage identification for ip traffic. Soft Comput 21(8):2035CrossRefGoogle Scholar
  16. 16.
    Williams N, Zander S, Armitage G (2006) A preliminary performance comparison of five machine learning algorithms for practical ip traffic flow classification. SIGCOMM Comput Commun Rev 36(5):5CrossRefGoogle Scholar
  17. 17.
    Kim H, Claffy K, Fomenkov M, Barman D, Faloutsos M, Lee K (2008) In: Proceedings of the ACM coNEXT conference, CoNEXT ’08, ACM, New York, pp 11:1–11:12Google Scholar
  18. 18.
    Karagiannis T, Papagiannaki K, Faloutsos M (2005) In: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communications, SIGCOMM ’05, ACM, New York, pp 229–240Google Scholar
  19. 19.
    Jiang W, Wang G, Bhuiyan MZA, Wu J (2016) Understanding graph-based trust evaluation in online social networks: Methodologies and challenges. ACM Comput Surv 49(1):10:1CrossRefGoogle Scholar
  20. 20.
    Yang W, Wang G, Bhuiyan MZA, Choo KKR (2017) Hypergraph partitioning for social networks based on information entropy modularity. J Netw Comput Appl 86:59. Special Issue on Pervasive Social NetworkingCrossRefGoogle Scholar
  21. 21.
    Peng S, Wang G, Xie D (2017) Social influence analysis in social networking big data: opportunities and challenges. IEEE Netw 31(1):11CrossRefGoogle Scholar
  22. 22.
    Peng S, Yang A, Cao L, Yu S, Xie D (2017) Social influence modeling using information theory in mobile social networks. Inf Sci 379:146CrossRefGoogle Scholar
  23. 23.
    Cai J, Wang Y, Liu Y, Luo JZ, Wei W, Xu X (2017) Enhancing network capacity by weakening community structure in scale-free network. Future Generation Computer Systems.
  24. 24.
    Chen S, Wang G, Yan G, Xie D (2017) Multi-dimensional fuzzy trust evaluation for mobile social networks based on dynamic community structures. Concurrency and Computation: Practice and Experience 29(7):e3901CrossRefGoogle Scholar
  25. 25.
    Este A, Gringoli F, Salgarelli L (2009) On the stability of the information carried by traffic flow features at the packet level. SIGCOMM Comput Commun Rev 39(3):13CrossRefzbMATHGoogle Scholar
  26. 26.
    Pietrzyk M, Costeux JL, Urvoy-Keller G, En-Najjary T (2009) In: Proceedings of the 9th ACM SIGCOMM conference on internet measurement, IMC ’09, ACM, New York, pp 122–135Google Scholar
  27. 27.
    Lim YS, Kim HC, Jeong J, Kim CK, Kwon TT, Choi Y (2010)Google Scholar
  28. 28.
    Zander S, Armitage G (2011) In: 2011 IEEE 36th conference on local computer networks, pp 399–406Google Scholar
  29. 29.
    Amaral P, Dinis J, Pinto P, Bernardo L, Tavares J, Mamede HS (2016) In: 2016 IEEE 24th international conference on network protocols (ICNP), pp 1–5Google Scholar
  30. 30.
    Crotti M, Dusi M, Gringoli F, Salgarelli L (2007) Traffic classification through simple statistical fingerprinting. SIGCOMM Comput Commun Rev 37(1):5CrossRefGoogle Scholar
  31. 31.
    Este A, Gringoli F, Salgarelli L (2009) Support vector machines for tcp traffic classification. Comput Netw 53(14):2476CrossRefzbMATHGoogle Scholar
  32. 32.
    Nguyen TTT, Armitage G, Branch P, Zander S (2012) Timely and continuous machine-learning-based classification for interactive ip traffic. IEEE/ACM Trans Netw 20(6):1880CrossRefGoogle Scholar
  33. 33.
    Wang Y, Chen C, Xiang Y (2015) In: 2015 IEEE 40th conference on local computer networks (LCN), pp 506–509Google Scholar
  34. 34.
    Campos HF, Nobel AB, Smith FD, Jeffay K (2003) In: 35th symposium on the interface of computing science and statisticsGoogle Scholar
  35. 35.
    McGregor A, Hall M, Lorier P, Brunskill J (2004) . In: Barakat C, Pratt I (eds) Passive and active network measurement. Springer, Berlin, pp 205–214Google Scholar
  36. 36.
    Zander S, Nguyen T, Armitage G (2005) In: The IEEE conference on local computer networks 30th anniversary (LCN’05)l, pp 250–257Google Scholar
  37. 37.
    Erman J, Mahanti A, Arlitt M (2006) In: IEEE Globecom 2006, pp 1–6Google Scholar
  38. 38.
    Bernaille L, Teixeira R, Akodkenou I, Soule A, Salamatian K (2006) Traffic classification on the fly. SIGCOMM Comput Commun Rev 36(2):23CrossRefGoogle Scholar
  39. 39.
    Erman J, Arlitt M, Mahanti A (2006) In: Proceedings of the SIGCOMM workshop on mining network data, MineNet ’06, ACM, New York, pp 281–286Google Scholar
  40. 40.
    Wang Y, Xiang Y, Zhang J, Zhou W, Wei G, Yang LT (2014) Internet traffic classification using constrained clustering. IEEE Trans Parallel Distrib Syst 25(11):2932CrossRefGoogle Scholar
  41. 41.
    Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007) In: Proceedings of the ACM SIGMETRICS international conference on measurement and modeling of computer systems, SIGMETRICS ’07, ACM, New York, pp 369–370Google Scholar
  42. 42.
    Li P, Li J, Huang Z, Gao CZ, Chen WB, Chen K (2017) Privacy-preserving outsourced classification in cloud computing. Cluster Computing.
  43. 43.
    Gao CZ, Cheng Q, Li X, Xia SB (2018) Cloud-assisted privacy-preserving profile-matching scheme under multiple keys in mobile social network. Cluster Computing.
  44. 44.
    Luo E, Liu Q, Abawajy JH, Wang G (2017) Privacy-preserving multi-hop profile-matching protocol for proximity mobile social networks. Futur Gener Comput Syst 68:222CrossRefGoogle Scholar
  45. 45.
    Li P, Li J, Huang Z, Li T, Gao CZ, Yiu SM, Chen K (2017) Multi-key privacy-preserving deep learning in cloud computing. Futur Gener Comput Syst 74:76CrossRefGoogle Scholar
  46. 46.
    Li J, Zhang Y, Chen X, Xiang Y (2018) Secure attribute-based data sharing for resource-limited users in cloud computing. Comput Secur 72:1CrossRefGoogle Scholar
  47. 47.
    zhi Gao C, Cheng Q, He P, Susilo W, Li J (2018) Privacy-preserving naive bayes classifiers secure against the substitution-then-comparison attack. Inf Sci 444:72MathSciNetCrossRefGoogle Scholar
  48. 48.
    A day in the life of the internet (ditl). Accessed: 2018-03-01
  49. 49.
    Tcp statistic and analysis tool. Accessed: 2018-03-01
  50. 50.
    Wireshark. Accessed: 2018-03-01
  51. 51.
    Libsvm – a library for support vector machines.∼cjlin/libsvm/. Accessed: 2018-03-01

Copyright information

© Institut Mines-Télécom and Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Computer ScienceGuangzhou UniversityGuangzhouChina
  2. 2.Department of Electronic and Information EngineeringGuangzhou UniversityGuangzhouChina

Personalised recommendations