Skip to main content

Hybrid Connection and Host Clustering for Community Detection in Spatial-Temporal Network Data

  • Conference paper
  • First Online:
ECML PKDD 2020 Workshops (ECML PKDD 2020)

Abstract

Network data clustering and sequential data mining are large fields of research, but how to combine them to analyze spatial-temporal network data remains a technical challenge. This study investigates a novel combination of two sequential similarity methods (Dynamic Time Warping and N-grams with Cosine distances), with two state-of-the-art unsupervised network clustering algorithms (Hierarchical Density-based Clustering and Stochastic Block Models). A popular way to combine such methods is to first cluster the sequential network data, resulting in connection types. The hosts in the network can then be clustered conditioned on these types. In contrast, our approach clusters nodes and edges in one go, i.e., without giving the output of a first clustering step as input for a second step. We achieve this by implementing sequential distances as covariates for host clustering. While being fully unsupervised, our method outperforms many existing approaches. To the best of our knowledge, the only approaches with comparable performance require manual filtering of connections and feature engineering steps. In contrast, our method is applied to raw network traffic. We apply our pipeline to the problem of detecting infected hosts (network nodes) from logs of unlabelled network traffic (sequential data). On data from the Stratosphere IPS project (CTU-Malware-Capture-Botnet-91), which includes malicious (Conficker botnet) as well as benign hosts, we show that our method perfectly detects peripheral, benign, and malicious hosts in different clusters. We replicate our results in the well-known ISOT dataset (Storm, Waledac, Zeus botnets) with comparable performance: conjointly, 99.97% of nodes were categorized correctly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abbe, E.: Community detection and stochastic block models: recent developments. J. Mach. Learn. Res. 18(1), 6446–6531 (2017)

    MathSciNet  Google Scholar 

  2. Barthakur, P., Dahal, M., Ghose, M.K.: A framework for P2P botnet detection using SVM. In: 2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, pp. 195–200 (2012)

    Google Scholar 

  3. Beigi, E.B., Jazi, H.H., Stakhanova, N., Ghorbani, A.A.: Towards effective feature selection in machine learning-based botnet detection approaches. In: 2014 IEEE Conference on Communications and Network Security (CNS), pp. 247–255 (2014)

    Google Scholar 

  4. Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008(10), P10008 (2008)

    Article  Google Scholar 

  5. Cai, T., Zou, F.: Detecting HTTP botnet with clustering network traffic. In: 2012 8th International Conference on Wireless Communications, Networking and Mobile Computing, pp. 1–7 (2012)

    Google Scholar 

  6. Campello, R.J.G.B., Moulavi, D., Sander, J.: Density-based clustering based on hierarchical density estimates. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013. LNCS (LNAI), vol. 7819, pp. 160–172. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37456-2_14

    Chapter  Google Scholar 

  7. Carl, L., et al.: Using machine learning techniques to identify botnet traffic. In: Proceedings of the 31st IEEE Conference on Local Computer Networks. IEEE (2006)

    Google Scholar 

  8. Chowdhury, S., et al.: Botnet detection using graph-based feature clustering. J. Big Data 4(1), 14 (2017). https://doi.org/10.1186/s40537-017-0074-7

    Article  Google Scholar 

  9. Coskun, B., Dietrich, S., Memon, N.: Friends of an enemy: identifying local members of peer-to-peer botnets using mutual contacts. In: Proceedings of the 26th Annual Computer Security Applications Conference, pp. 131–140 (2010)

    Google Scholar 

  10. Ester, M., Kriegel, H.-P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)

    Google Scholar 

  11. Feizollah, A., Anuar, N.B., Salleh, R., Amalina, F., Shamshirband, S., et al.: A study of machine learning classifiers for anomaly-based mobile botnet detection. Malays. J. Comput. Sci. 26(4), 251–265 (2013)

    Google Scholar 

  12. Garcia, S., Grill, M., Stiborek, J., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Secur. 45, 100–123 (2014)

    Article  Google Scholar 

  13. Garg, S., Singh, A.K., Sarje, A.K., Peddoju, S.K.: Behaviour analysis of machine learning algorithms for detecting P2P botnets. In: 2013 15th International Conference on Advanced Computing Technologies (ICACT), pp. 1–4 (2013)

    Google Scholar 

  14. Giorgino, T., et al.: Computing and visualizing dynamic time warping alignments in R: the DTW package. J. Stat. Softw. 31(7), 1–24 (2009)

    Article  Google Scholar 

  15. Gu, G., Perdisci, R., Zhang, J., Lee, W.: BotMiner: clustering analysis of network traffic for protocol-and structure-independent botnet detection (2008)

    Google Scholar 

  16. Gu, G., Zhang, J., Lee, W.: BotSniffer: detecting botnet command and control channels in network traffic (2008)

    Google Scholar 

  17. Haddadi, F., Morgan, J., Gomes Filho, E., Zincir-Heywood, A.N.: Botnet behaviour analysis using IP flows: with HTTP filters using classifiers. In: 2014 28th International Conference on Advanced Information Networking and Applications Workshops (WAINA), pp. 7–12 (2014)

    Google Scholar 

  18. Handcock, M.S., et al.: Temporal exponential random graph models (TERGMs) for dynamic network modeling in statnet. In: Sunbelt 2015 (2015)

    Google Scholar 

  19. Hyvarinen, A., Morioka, H.: Unsupervised feature extraction by time contrastive learning and nonlinear ICA. In: Advances in Neural Information Processing Systems, pp. 3765–3773 (2016)

    Google Scholar 

  20. Ioannidis, J.P.A.: Why most published research findings are false. PLos Med. 2(8), e124 (2005)

    Article  Google Scholar 

  21. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)

    Article  Google Scholar 

  22. Jung, T., Wickrama, K.A.S.: An introduction to latent class growth analysis and growth mixture modeling. Soc. Pers. Psychol. Compass 2(1), 302–317 (2008)

    Article  Google Scholar 

  23. Kostakis, O., Tatti, N., Gionis, A.: Discovering recurring activity in temporal networks. Data Min. Knowl. Discov. 31(6), 1840–1871 (2017). https://doi.org/10.1007/s10618-017-0515-0

    Article  MathSciNet  MATH  Google Scholar 

  24. Kostakos, V.: Temporal graphs. Phys. A: Stat. Mech. Appl. 388(6), 1007–1023 (2009)

    Article  MathSciNet  Google Scholar 

  25. Kumar, V., Dhok, S.B., Tripathi, R., Tiwari, S.: A review study of hierarchical clustering algorithms for wireless sensor networks. Int. J. Comput. Sci. Issues (IJCSI) 11(3), 92 (2014)

    Google Scholar 

  26. Lagraa, S., François, J., Lahmadi, A., Miner, M., Hammerschmidt, C., State, R.: BotGM: unsupervised graph mining to detect botnets in traffic flows. In: 2017 1st Cyber Security in Networking Conference (CSNet), pp. 1–8 (2017)

    Google Scholar 

  27. Lee, C., Wilkinson, D.J.: A review of stochastic block models and extensions for graph clustering. arXiv preprint arXiv:1903.00114 (2019)

  28. Leger, J.-B.: Blockmodels: a R-package for estimating in latent block model and stochastic block model, with various probability functions, with or without covariates. arXiv preprint arXiv:1602.07587 (2016)

  29. Liu, F., Li, Z., Nie, Q.: A new method of P2P traffic identification based on support vector machine at the host level. In: 2009 International Conference on Information Technology and Computer Science, pp. 579–582 (2009)

    Google Scholar 

  30. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  31. Masuda, N., Holme, P.: Detecting sequences of system states in temporal networks. Sci. Rep. 9(1), 1–11 (2019)

    Article  Google Scholar 

  32. Mossel, E., Neeman, J., Sly, A.: Stochastic block models and reconstruction. arXiv preprint arXiv:1202.1499 (2012)

  33. Nadeem, A., Hammerschmidt, C., Gañán, C.H., Verwer, S.: MalPaCA: malware packet sequence clustering and analysis. arXiv preprint arXiv:1904.01371 (2019)

  34. Nagaraja, S., Mittal, P., Hong, C.-Y., Caesar, M., Borisov, N.: BotGrep: finding P2P bots with structured graph analysis. In: USENIX Security Symposium, pp. 95–110 (2010)

    Google Scholar 

  35. Park, Y., Bader, J.S.: Fast and reliable inference algorithm for hierarchical stochastic block models. arXiv preprint arXiv:1711.05150 (2017)

  36. Rahbarinia, B., Perdisci, R., Lanzi, A., Li, K.: PeerRush: mining for unwanted P2P traffic. In: Rieck, K., Stewin, P., Seifert, J.-P. (eds.) DIMVA 2013. LNCS, vol. 7967, pp. 62–82. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39235-1_4

    Chapter  Google Scholar 

  37. Roeling, M.P., Nicholls, G.: Stochastic block models as an unsupervised approach to detect botnet-infected clusters in networked data. Data Sci. Cybersecur. 3, 161 (2018)

    Google Scholar 

  38. Saad, S., et al.: Detecting P2P botnets through network behavior analysis and machine learning. In: 2011 Ninth Annual International Conference on Privacy, Security and Trust (PST), pp. 174–180 (2011)

    Google Scholar 

  39. Sakib, M.N., Huang, C.-T.: Using anomaly detection based techniques to detect HTTP-based botnet C&C traffic. In: 2016 IEEE International Conference on Communications (ICC), pp. 1–6 (2016)

    Google Scholar 

  40. Saxena, A., et al.: A review of clustering techniques and developments. Neurocomputing 267, 664–681 (2017)

    Article  Google Scholar 

  41. Snijders, T.A.B.: Stochastic actor-oriented models for network dynamics. Ann. Rev. Stat. Appl. 4, 343–363 (2017)

    Article  Google Scholar 

  42. Strayer, W.T., Lapsely, D., Walsh, R., Livadas, C.: Botnet detection based on network behavior. In: Lee, W., Wang, C., Dagon, D. (eds.) Botnet Detection. ADIS, vol. 36, pp. 1–24. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-68768-1_1

  43. Szabó, G., Orincsay, D., Malomsoky, S., Szabó, I.: On the validation of traffic classification algorithms, In: Claypool, M., Uhlig, S. (eds.) PAM 2008. LNCS, vol. 4979, pp. 72–81. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79232-1_8

  44. Tavse, P., Khandelwal, A.: A critical review on data clustering in wireless network. Int. J. Adv. Comput. Res. 4(3), 795 (2014)

    Google Scholar 

  45. Torres, P., Catania, C., Garcia, S., Garino, C.G.: An analysis of recurrent neural networks for botnet detection behavior. In: 2016 IEEE Biennial Congress of Argentina (ARGENCON), pp. 1–6 (2016)

    Google Scholar 

  46. Wang, C.-Y., et al.: BotCluster: a session-based P2P botnet clustering system on NetFlow. Comput. Netw. 145, 175–189 (2018)

    Article  Google Scholar 

  47. Wang, J., Paschalidis, I.C.: Botnet detection based on anomaly and community detection. IEEE Trans. Control Netw. Syst. 4(2), 392–404 (2016)

    Article  MathSciNet  Google Scholar 

  48. Xu, R., Wunsch, D.C.: Clustering algorithms in biomedical research: a review. IEEE Rev. Biomed. Eng. 3, 120–154 (2010)

    Article  Google Scholar 

  49. Yamauchi, K., Hori, Y., Sakurai, K.: Detecting HTTP-based botnet based on characteristic of the C & C session using by SVM. In: 2013 Eighth Asia Joint Conference on Information Security, pp. 63–68 (2013)

    Google Scholar 

  50. Zhang, J., Perdisci, R., Lee, W., Sarfraz, U., Luo, X.: Detecting stealthy P2P botnets using statistical traffic fingerprints. In: 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN), pp. 121–132 (2011)

    Google Scholar 

  51. Zhao, D., Traore, I., Ghorbani, A., Sayed, B., Saad, S., Lu, W.: Peer to peer botnet detection based on flow intervals. In: Gritzalis, D., Furnell, S., Theoharidou, M. (eds.) SEC 2012. IFIPAICT, vol. 376, pp. 87–102. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30436-1_8

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Patrick Roeling .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Roeling, M.P., Nadeem, A., Verwer, S. (2020). Hybrid Connection and Host Clustering for Community Detection in Spatial-Temporal Network Data. In: Koprinska, I., et al. ECML PKDD 2020 Workshops. ECML PKDD 2020. Communications in Computer and Information Science, vol 1323. Springer, Cham. https://doi.org/10.1007/978-3-030-65965-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-65965-3_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-65964-6

  • Online ISBN: 978-3-030-65965-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics