A Novel Hybrid Data Reduction Strategy and Its Application to Intrusion Detection

  • Vitali Herrera-Semenets
  • Osvaldo Andrés Pérez-García
  • Andrés Gago-Alonso
  • Raudel Hernández-León
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10657)


The presence of useless information and the huge amount of data generated by telecommunication services can affect the efficiency of traditional Intrusion Detection Systems (IDSs). This fact encourage the development of data preprocessing strategies for improving the efficiency of IDSs. On the other hand, improving such efficiency relying on the data reduction strategies, without affecting the quality of the reduced dataset (i.e. keeping the accuracy during the classification process), represents a challenge. Also, the runtime of commonly used strategies is usually high. In this paper, a novel hybrid data reduction strategy is presented. The proposed strategy reduces the number of features and instances in the training collection without greatly affecting the quality of the reduced dataset. In addition, it improves the efficiency of the classification process. Finally, our proposal is favorably compared with other hybrid data reduction strategies.


Data mining Data reduction Instance selection Feature selection 


  1. 1.
    Zuech, R., Khoshgoftaar, T.M., Wald, R.: Intrusion detection and big heterogeneous data: a survey. J. Big Data 2(1), 1–41 (2015)CrossRefGoogle Scholar
  2. 2.
    Cortes, C., Pregibon, D.: Signature-based methods for data streams. Data Mining Knowl. Discov. 5(3), 167–182 (2001)CrossRefMATHGoogle Scholar
  3. 3.
    Kotsiantis, S.B.: Supervised machine learning: a review of classification techniques. In: Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering, Amsterdam, The Netherlands, pp. 3–24 (2007)Google Scholar
  4. 4.
    Ghahramani, Z.: Unsupervised learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 72–112. Springer, Heidelberg (2004). CrossRefGoogle Scholar
  5. 5.
    Aggarwal, C.C.: Data Mining: The Textbook. Springer, Heidelberg (2015)CrossRefMATHGoogle Scholar
  6. 6.
    Chen, T., Zhang, X., Jin, S., Kim, O.: Efficient classification using parallel and scalable compressed model and its application on intrusion detection. Expert Syst. Appl. 41(13), 5972–5983 (2014)CrossRefGoogle Scholar
  7. 7.
    Olvera-Lopez, J.A., Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34(2), 133–143 (2010)CrossRefGoogle Scholar
  8. 8.
    Chou, C.H., Kuo, B.H. and Chang, F.: The generalized condensed nearest neighbor rule as a data reduction method. In: 18th International Conference on Pattern Recognition (ICPR 2006), vol. 2, pp. 556–559 (2006)Google Scholar
  9. 9.
    Wilson, D.R., Martínez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)CrossRefMATHGoogle Scholar
  10. 10.
    Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11(1), 63–90 (1993)CrossRefMATHGoogle Scholar
  11. 11.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Liu, W., Liu, S., Gu, Q., Chen, J., Chen, X., Chen, D.: Empirical studies of a two-stage data preprocessing approach for software fault prediction. IEEE Trans. Reliab. 65(1), 38–53 (2016)CrossRefGoogle Scholar
  13. 13.
    KDDCup 1999: Computer network intrusion detection. Accessed 25 Feb 2017
  14. 14.
    Song, J.: CDMC2013 intrusion detection dataset. Department of Science & Technology Security, Korea Institute of Science and Technology Information (KISTI) (2013)Google Scholar
  15. 15.
    Horng, S.J., Su, M.Y., Chen, Y.H., Kao, T.W., Chen, R.J., Lai, J.L., Perkasa, C.D.: A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Syst. Appl. 38(1), 306–313 (2011)CrossRefGoogle Scholar
  16. 16.
    Aburomman, A.A., Reaz, M.B.I.: A novel SVM-kNN-PSO ensemble method for intrusion detection system. Appl. Soft Comput. 38, 360–372 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Vitali Herrera-Semenets
    • 1
  • Osvaldo Andrés Pérez-García
    • 1
  • Andrés Gago-Alonso
    • 1
  • Raudel Hernández-León
    • 1
  1. 1.Advanced Technologies Application Center (CENATAV)HavanaCuba

Personalised recommendations