Network Traffic Text Classification Based on Multi-instance Learning and Principal Component Analysis

Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 571)


Network traffic text classification plays an important role in network security. Traditional classification methods based on machine learning, such as supervised learning algorithms and semi-supervised algorithms, are insufficient: classification mode is too simple, unable to adapt to diverse classification requirements; text feature selection method is simple, text classification lacks diversity, and classification accuracy is low. And the classification speed is slow, not suitable for environments with high traffic and real-time. Multi-instance learning classification can describe the characteristics of the sample more accurately and comprehensively, and can improve the classification effect. In this paper, we combined the multi-instance learning classification with principal component analysis (PCA) to select text features of data sets, and removed the redundant and uncorrelated features in the original data, obtained a better classification accuracy.


Text classification Multi-instance learning Principal component analysis 


  1. 1.
    Guo X (2017) Feature weighting and distance metric learning for multiple-instance classification. Taiyuan University of TechnologyGoogle Scholar
  2. 2.
    Liu F (2019) Weighted KNN text classification algorithm for variable precision rough sets. Comput Eng Des, 1339–1364Google Scholar
  3. 3.
    Liu Y (2019) Research and application of text classification based on improved random forest algorithm. Comput Syst Appl 28(5):220–225Google Scholar
  4. 4.
    C Li, Zhang Z-K (2019) Improved MIMLSVM algorithm based on global and local label correlations. Comput Syst Appl 28(4):131–138Google Scholar
  5. 5.
    Zhou P, Qi Z, Zheng S et al (2016) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. In: Computational linguistics, COLING 2016, Osaka, Japan, pp 3485–3495Google Scholar
  6. 6.
    Johnson R, Zhang T (2016) Supervised and semi-supervised text categorization using LSTM for region embeddings. In: International conference on machine learning, ICML 2016, New York City, USA, pp 526–534Google Scholar
  7. 7.
    Song P, Jing L (2018) Exploiting label relationships in multi-label classification with neural networks. J Comput Res Dev 55(8):1751–1759Google Scholar
  8. 8.
    Sheng Liang (2010) Identification method of internet streaming based on SVM and clustering. Comput Eng Des 31(7):1566–1569Google Scholar
  9. 9.
    Ouyang G, Li Q, Man J (2013) The network traffic classification techniques based on DDAG-SVM. Math Pract Theory 43(8):197–203Google Scholar
  10. 10.
    Liu X, Yang J, Lu K et al (2019) Research on different feature extraction and algorithms for ultra-short text classification. Inf Technol Netw Secur 38(5):48–52Google Scholar
  11. 11.
    Ramesh B, Sathiaseelan JGR (2015) An advanced multi class instance selection based support vector machine for text classification. In: 3rd international conference on recent trends in computing 2015 (ICRTC-2015), pp 1124–1130Google Scholar
  12. 12.
    Pascoal C, Rosario de Oliveira M, Valadas R et al (2012) Robust feature selection and robust PCA for internet traffic anomaly detection. In: INFOCOM, 2012 Proceedings IEEE, Orlando, pp 1755–1763Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Network & Information CenterDalian Jiaotong UniversityDalianChina
  2. 2.Information Science and Technology CollegeDalian Maritime UniversityLiaoningChina

Personalised recommendations