A fast unsupervised preprocessing method for network monitoring

Andreoni Lopez, Martin; Mattos, Diogo M. F.; Duarte, Otto Carlos M. B.; Pujolle, Guy

doi:10.1007/s12243-018-0663-2

A fast unsupervised preprocessing method for network monitoring

Published: 31 August 2018

Volume 74, pages 139–155, (2019)
Cite this article

Annals of Telecommunications Aims and scope Submit manuscript

Martin Andreoni Lopez ORCID: orcid.org/0000-0002-4170-4341^1,2,
Diogo M. F. Mattos³,
Otto Carlos M. B. Duarte¹ &
…
Guy Pujolle²

574 Accesses
26 Citations
Explore all metrics

Abstract

Identifying a network misuse takes days or even weeks, and network administrators usually neglect zero-day threats until a large number of malicious users exploit them. Besides, security applications, such as anomaly detection and attack mitigation systems, must apply real-time monitoring to reduce the impacts of security incidents. Thus, information processing time should be as small as possible to enable an effective defense against attacks. In this paper, we present a fast preprocessing method for network traffic classification based on feature correlation and feature normalization. Our proposed method couples a normalization and feature selection algorithms. We evaluate the proposed algorithms against three different datasets for eight different machine learning classification algorithms. Our proposed normalization algorithm reduces the classification error rate when compared with traditional methods. Our feature selection algorithm chooses an optimized subset of features improving accuracy by more than 11% within a 100-fold reduction in processing time when compared to traditional feature selection and feature reduction algorithms. The preprocessing method is performed in batch and streaming data, being able to detect concept-drift.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of network traffic features for anomaly detection

Article 04 December 2014

Félix Iglesias & Tanja Zseby

A Survey of Network Traffic Classification Methods Using Machine Learning

Article 29 November 2022

A. I. Getman & M. K. Ikonnikova

Data Preprocessing Technology in Network Traffic Anomaly Detection

Notes

Features refer to the original set of attributes that describe the data. Variables refer to the input of the machine learning algorithms applied over the data. If no preprocessing method handles the original data, the set of variables and the set of features are the same.
Anonymized data can be asked by sending an email contact to the authors

References

Hu P, Li H, Fu H, Cansever D, Mohapatra P (2015) Dynamic defense strategy against advanced persistent threat with insiders. In: IEEE conference on computer communications (INFOCOM), vol 4, pp 747–755
Andreoni Lopez M, Ferrazani Mattos DM, Duarte OCMB (2016) An elastic intrusion detection system for software networks. Ann Telecommun 71(11):595–605. https://doi.org/10.1007/s12243-016-0506-y
Article Google Scholar
Ferrazani Mattos DM, Duarte OCMB (2016) AuthFlow: authentication and access control mechanism for software defined networking. Ann Telecommun 71(11):607–615. https://doi.org/10.1007/s12243-016-0505-z
Article Google Scholar
Paxson V (1999) Bro: a system for detecting network intruders in real-time. Comput Netw 31(23–24):2435–2463
Article Google Scholar
Roesch M (1999) Snort-lightweight intrusion detection for networks. In: Proceedings of the 13th USENIX Conference on System Administration. USENIX Association, pp 229–238
Vallentin M, Sommer R, Lee J, Leres C, Paxson V, Tierney B (2007) The NIDS cluster: scalable, stateful network intrusion detection on commodity hardware. In: Recent advances in intrusion detection. Springer, Berlin, pp 107–126
Bar A, Finamore A, Casas P, Golab l., Mellia M (2014) Large-scale network traffic monitoring with DBStream, a system for rolling big data analysis. In: 2014 IEEE International Conference on Big Data (Big Data). IEEE, vol 10, pp 165–170
Stonebraker M, Çetintemel U, Zdonik S (2005) The 8 requirements of real-time stream processing. ACM SIGMOD Rec 34(4):42–47
Article Google Scholar
Mayhew M, Atighetchi M, Adler A, Greenstadt R (2015) Use of machine learning in big data analytics for insider threat detection. In: IEEE Military Communications Conference. MILCOM, vol 10, pp 915–922
Mladenić D (2006) Feature selection for dimensionality reduction. In: Saunders C, Grobelnik M, Gunn S, Shawe-Taylor J (eds) Subspace, latent structure and feature selection (slsfs): statistical and optimization perspectives workshop, pp 84–102. Springer, Bohinj
Bifet A, Morales GDF (2014) Big data stream learning with Samoa. In: 2014 IEEE International Conference on Data Mining Workshop, pp 1199–1202
Khamassi I, Sayed-Mouchaweh M, Hammami M, Ghédira K (2018) Discussion and review on evolving data streams and concept drift adapting. Evol Syst 9(1):1–23
Article Google Scholar
Rahm E, Do HH (2000) Data cleaning: problems and current approaches. IEEE Bullet Tech Comm Data Eng 23(4):3–13
Google Scholar
García S, Luengo J, Herrera F (2016) Data preprocessing in data mining. Springer, Berlin
Google Scholar
Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53(1/2):23– 69
Article MATH Google Scholar
Schölkopf B, Smola AJ, Müller K-R (1999) Kernel principal component analysis. In: Advances in kernel methods. MIT Press, Cambridge, pp 327–352
García S, Luengo J, Herrera F (2016) Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl-Based Syst 98:1–29. [Online]. Available: http://linkinghub.elsevier.com/retrieve/pii/S0950705115004785
Article Google Scholar
Zhang S, Zhang C, Yang Q (2003) Data preparation for data mining. Appl Artif Intell 17(5–6):375–381
Article Google Scholar
Tan S (2005) Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Syst Appl 28(4):667–671
Article Google Scholar
Ramérez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F (2017) A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing
Van Der Maaten L, Postma E, den Herik J (2009) Dimensionality reduction: a comparative. J Mach Learn Res 10:66–71
Google Scholar
Ang JC, Mirzal A, Haron H, Hamed HNA (2016) Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans Comput Biol Bioinform 13(5):971–989
Article Google Scholar
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28
Article Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1-3):389–422
Article MATH Google Scholar
Hall MA (1999) Correlation-based feature selection for machine learning. Ph.D. dissertation, The University of Waikato
Kumar A, Sung M, Xu JJ, Wang J (2004) Data streaming algorithms for efficient and accurate estimation of flow size distribution. In: ACM SIGMETRICS performance evaluation review. ACM, vol 132, no. 1, pp 177-188
Ben-Haim Y, Tom-tov E (2010) A streaming parallel decision tree algorithm. J Mach Learn Res 11:849–872
MathSciNet MATH Google Scholar
Webb GI (2014) Contrary to popular belief incremental discretization can be sound, computationally efficient and extremely useful for streaming data. In: IEEE International Conference on Data Mining (ICDM). IEEE, pp 1031–1036
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp 1–6
Lobato A, Andreoni Lopez M, Sanz IJ, Cárdenas A, Duarte OCMB, Pujolle G (2018) An adaptive real-time architecture for zero-day threat detection. In: IEEE ICC 2018 Next Generation Networking and Internet Symposium (ICC’18 NGNI), Kansas City, USA
Andreoni Lopez M, Silva RS, Alvarenga ID, Rebello GAF, Sanz IJ, Lobato AGP, Mattos DMF, Duarte OCMB, Pujolle G (2017) Collecting and characterizing a real broadband access network traffic dataset. In: IEEE/IFIP 1st Cyber Security in Networking Conference (CSNet), pp 1–8
Hu H, Kantardzic M (2016) Smart preprocessing improves data stream mining. In: 49th Hawaii International Conference on System Sciences (HICSS). IEEE, pp 1749–1757
Buczak AL, Guven E (2016) A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun Surv Tutorials 18(2):1153–1176. https://doi.org/10.1109/COMST.2015.2494502
Article Google Scholar
Prasath VBS, Alfeilat HAA, Lasassmeh O, Hassanat ABA Distance and similarity measures effect on the performance of k-nearest neighbor classifier - a review, CoRR. [Online]. arXiv:1708.04321
Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-First International Conference on Machine Learning. ACM, pp 116
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Perkins S, Theiler J (2003) Online feature selection using grafting. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp 592–599
Zhou J, Foster DP, Stine RA, Ungar LH (2006) Streamwise feature selection. J Mach Learn Res 7 (Sep):1861–1885
MathSciNet MATH Google Scholar
Wu X, Yu K, Ding W, Wang H, Zhu X (2013) Online feature selection with streaming features. IEEE Trans Pattern Anal Mach Intell 35(5):1178–1192
Article Google Scholar

Download references

Acknowledgments

The authors would like to thank Antonio Lobato, Igor Alvarenga, and Igor Sanz for their significant contributions to obtain the results.

Funding

This research is supported by CNPq, CAPES, FAPERJ, and FAPESP (2015/24514-9, 2015/24485-9, and 2014/50937-1).

Author information

Authors and Affiliations

Universidade Federal do Rio de Janeiro - GTA/COPPE/UFRJ, Rio de Janeiro, Brazil
Martin Andreoni Lopez & Otto Carlos M. B. Duarte
CNRS, Laboratoire d’Informatique de Paris 6, Sorbonne Université, F-75005, Paris, France
Martin Andreoni Lopez & Guy Pujolle
Universidade Federal de Fluminense - (UFF), Niteroi, Brazil
Diogo M. F. Mattos

Authors

Martin Andreoni Lopez
View author publications
You can also search for this author in PubMed Google Scholar
Diogo M. F. Mattos
View author publications
You can also search for this author in PubMed Google Scholar
Otto Carlos M. B. Duarte
View author publications
You can also search for this author in PubMed Google Scholar
Guy Pujolle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Andreoni Lopez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Andreoni Lopez, M., Mattos, D.M.F., Duarte, O.C.M.B. et al. A fast unsupervised preprocessing method for network monitoring. Ann. Telecommun. 74, 139–155 (2019). https://doi.org/10.1007/s12243-018-0663-2

Download citation

Received: 05 June 2018
Accepted: 21 August 2018
Published: 31 August 2018
Issue Date: 01 April 2019
DOI: https://doi.org/10.1007/s12243-018-0663-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fast unsupervised preprocessing method for network monitoring

Abstract

Access this article

Similar content being viewed by others

Analysis of network traffic features for anomaly detection

A Survey of Network Traffic Classification Methods Using Machine Learning

Data Preprocessing Technology in Network Traffic Anomaly Detection

Notes

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A fast unsupervised preprocessing method for network monitoring

Abstract

Access this article

Similar content being viewed by others

Analysis of network traffic features for anomaly detection

A Survey of Network Traffic Classification Methods Using Machine Learning

Data Preprocessing Technology in Network Traffic Anomaly Detection

Notes

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation