Advertisement

Robust Malicious Domain Detection

  • Nitay Hason
  • Amit DvirEmail author
  • Chen Hajaj
Conference paper
  • 66 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12161)

Abstract

Malicious domains are increasingly common and pose a severe cybersecurity threat. Specifically, many types of current cyber attacks use URLs for attack communications (e.g., C&C, phishing, and spear-phishing). Despite the continuous progress in detecting these attacks, many alarming problems remain open, such as the weak spots of the defense mechanisms. Because ML has become one of the most prominent methods of malware detection, we propose a robust feature selection mechanism that results in malicious domain detection models that are resistant to black-box evasion attacks. This paper makes two main contributions. Our mechanism exhibits high performance based on data collected from ~5000 benign active URLs and ~1350 malicious active (attacks) URLs. We also provide an analysis of robust feature selection based on widely used features in the literature. Note that even though we cut the feature set dimensional space in half (from nine to four features), we still improve the performance of the classifier (an increase in the model’s F1-score from 92.92% to 95.81%). The fact that our models are robust to malicious perturbations but are also useful for clean data demonstrates the effectiveness of constructing a model that is solely trained on robust features.

Keywords

Malware detection Robust features Domain 

Notes

Acknowledgement

This work was supported by the Ariel Cyber Innovation Center in conjunction with the Israel National Cyber directorate in the Prime Minister’s Office. This work was supported by the Data Science and Artificial Intelligence Research Center at Ariel University.

References

  1. 1.
  2. 2.
    Clarifications to the DNS specification. https://tools.ietf.org/html/rfc2181
  3. 3.
  4. 4.
  5. 5.
  6. 6.
    URL hause by abuse. https://urlhaus.abuse.ch
  7. 7.
  8. 8.
    Ahmed, M., Khan, A., Saleem, O., Haris, M.: A fault tolerant approach for malicious URL filtering. In: 2018 International Symposium on Networks, Computers and Communications (ISNCC), pp. 1–6. IEEE (2018)Google Scholar
  9. 9.
    Antonakakis, M., Perdisci, R., Dagon, D., Lee, W., Feamster, N.: Building a dynamic reputation system for DNS. In: USENIX Security Symposium, pp. 273–290 (2010)Google Scholar
  10. 10.
    Antonakakis, M., Perdisci, R., Lee, W., Vasiloglou, N., Dagon, D.: Detecting malware domains at the upper DNS hierarchy. In: USENIX Security Symposium, vol. 11, pp. 1–16 (2011)Google Scholar
  11. 11.
    Berger, H., Dvir, A.Z., Geva, M.: A wrinkle in time: a case study in DNS poisoning. CoRR abs/1906.10928 (2019). http://arxiv.org/abs/1906.10928
  12. 12.
    Bilge, L., Sen, S., Balzarotti, D., Kirda, E., Kruegel, C.: Exposure: a passive DNS analysis service to detect and report malicious domains. ACM Trans. Inf. Syst. Secur. 16(4), 14:1–14:28 (2014). http://doi.acm.org/10.1145/2584679
  13. 13.
    Blum, A., Wardman, B., Solorio, T., Warner, G.: Lexical feature based phishing URL detection using online learning. In: Proceedings of the 3rd ACM Workshop on Artificial Intelligence and Security, pp. 54–60. ACM (2010)Google Scholar
  14. 14.
    Caglayan, A., Toothaker, M., Drapeau, D., Burke, D., Eaton, G.: Real-time detection of fast flux service networks. In: Conference For Homeland Security, CATCH 2009. Cybersecurity Applications & Technology, pp. 285–292. IEEE (2009)Google Scholar
  15. 15.
    Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler: a fast filter for the large-scale detection of malicious web pages. In: Proceedings of the 20th International Conference on World Wide Web, pp. 197–206. ACM (2011)Google Scholar
  16. 16.
    Choi, H., Zhu, B.B., Lee, H.: Detecting malicious web links and identifying their attack types. WebApps 11(11), 218 (2011)Google Scholar
  17. 17.
    Das, A., Data, G., Platform, A., Jain, E., Dey, S.: Machine learning features for malicious URL filtering-the survey (2019)Google Scholar
  18. 18.
    Dolberg, L., François, J., Engel, T.: Efficient multidimensional aggregation for large scale monitoring. In: LISA, pp. 163–180 (2012)Google Scholar
  19. 19.
    Harel, N., Dvir, A., Dubin, R., Barkan, R., Shalala, R., Hadar, O.: Misal-a minimal quality representation switch logic for adaptive streaming. Multimed. Tools Appl. 78, 26483–26508 (2019)CrossRefGoogle Scholar
  20. 20.
    Hu, Z., Chiong, R., Pranata, I., Susilo, W., Bao, Y.: Identifying malicious web domains using machine learning techniques with online credibility and performance data. In: 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 5186–5194. IEEE (2016)Google Scholar
  21. 21.
    Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)CrossRefGoogle Scholar
  22. 22.
    Jung, J., Sit, E.: An empirical study of spam traffic and the use of DNS black lists. In: Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement, pp. 370–375. ACM (2004)Google Scholar
  23. 23.
    Khonji, M., Iraqi, Y., Jones, A.: Phishing detection: a literature survey. IEEE Commun. Surv. Tutor. 15(4), 2091–2121 (2013)CrossRefGoogle Scholar
  24. 24.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  25. 25.
    Le, A., Markopoulou, A., Faloutsos, M.: PhishDef: URL names say it all. In: 2011 Proceedings IEEE INFOCOM, pp. 191–195. IEEE (2011)Google Scholar
  26. 26.
    Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45, 503–528 (1989)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1245–1254. ACM (2009)Google Scholar
  28. 28.
    Mishsky, I., Gal-Oz, N., Gudes, E.: A topology based flow model for computing domain reputation. In: Samarati, P. (ed.) DBSec 2015. LNCS, vol. 9149, pp. 277–292. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-20810-7_20CrossRefGoogle Scholar
  29. 29.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010)Google Scholar
  30. 30.
    Nelms, T., Perdisci, R., Ahamad, M.: ExecScent: mining for new C&C domains in live networks with adaptive control protocol templates. In: USENIX Security Symposium, pp. 589–604 (2013)Google Scholar
  31. 31.
    Othman, H., Gudes, E., Gal-Oz, N.: Advanced flow models for computing the reputation of internet domains. In: Steghöfer, J.-P., Esfandiari, B. (eds.) IFIPTM 2017. IAICT, vol. 505, pp. 119–134. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-59171-1_10CrossRefGoogle Scholar
  32. 32.
    Papernot, N., McDaniel, P., Wu, X., Jha, S.: Distillation as a defense to adversarial perturbations against deep neural networks. In: IEEE Symposium on Security and Privacy (2016)Google Scholar
  33. 33.
    Park, J., Sandberg, I.W.: Universal approximation using radial-basis-function networks. Neural Comput. 3(2), 246–257 (1991)CrossRefGoogle Scholar
  34. 34.
    Peng, T., Harris, I., Sawa, Y.: Detecting phishing attacks using natural language processing and machine learning. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pp. 300–301. IEEE (2018)Google Scholar
  35. 35.
    Perdisci, R., Corona, I., Giacinto, G.: Early detection of malicious flux networks via large-scale passive dns traffic analysis. IEEE Trans. Dependable Secure Comput. 9(5), 714–726 (2012)Google Scholar
  36. 36.
    Prakash, P., Kumar, M., Kompella, R.R., Gupta, M.: PhishNet: predictive blacklisting to detect phishing attacks. In: 2010 Proceedings IEEE INFOCOM, pp. 1–5. IEEE (2010)Google Scholar
  37. 37.
    Rahbarinia, B., Perdisci, R., Antonakakis, M.: Efficient and accurate behavior-based tracking of malware-control domains in large ISP networks. ACM Trans. Priv. Secur. (TOPS) 19(2), 4 (2016)Google Scholar
  38. 38.
    Ranganayakulu, D., Chellappan, C.: Detecting malicious urls in e-mail-an implementation. AASRI Procedia 4, 125–131 (2013)CrossRefGoogle Scholar
  39. 39.
    Sahoo, D., Liu, C., Hoi, S.C.: Malicious URL detection using machine learning: a survey. arXiv preprint arXiv:1701.07179 (2017)
  40. 40.
    Sandell, N., Varaiya, P., Athans, M., Safonov, M.: Survey of decentralized control methods for large scale systems. IEEE Trans. Autom. Control 23(2), 108–128 (1978)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Sheng, S., Wardman, B., Warner, G., Cranor, L.F., Hong, J., Zhang, C.: An empirical analysis of phishing blacklists. In: Sixth Conference on Email and Anti-Spam (CEAS), California, USA (2009)Google Scholar
  42. 42.
    Shi, Y., Chen, G., Li, J.: Malicious domain name detection based on extreme machine learning. Neural Process. Lett. 48, 1–11 (2017)Google Scholar
  43. 43.
    Shu, X., Tian, K., Ciambrone, A., Yao, D.: Breaking the target: an analysis of target data breach and lessons learned. arXiv preprint arXiv:1701.04940 (2017)
  44. 44.
    Sun, X., Tong, M., Yang, J., Xinran, L., Heng, L.: HinDom: a robust malicious domain detection system based on heterogeneous information network with transductive classification. In: 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2019), pp. 399–412 (2019)Google Scholar
  45. 45.
    Tong, L., Li, B., Hajaj, C., Xiao, C., Zhang, N., Vorobeychik, Y.: Improving robustness of ML classifiers against realizable evasion attacks using conserved features. In: The 28th USENIX Security Symposium, USENIX Security 2019 (2019)Google Scholar
  46. 46.
    Torabi, S., Boukhtouta, A., Assi, C., Debbabi, M.: Detecting internet abuse by analyzing passive DNS traffic: a survey of implemented systems. IEEE Commun. Surv. Tutor. 20, 3389–3415 (2018)CrossRefGoogle Scholar
  47. 47.
    Xiang, G., Hong, J., Rose, C.P., Cranor, L.: Cantina+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. (TISSEC) 14(2), 21 (2011)CrossRefGoogle Scholar
  48. 48.
    Yadav, S., Reddy, A.K.K., Reddy, A.L.N., Ranjan, S.: Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. IEEE/ACM Trans. Netw. 20(5), 1663–1677 (2012).  https://doi.org/10.1109/TNET.2012.2184552CrossRefGoogle Scholar
  49. 49.
    Zwaan, A.: Malicious domain name detection system (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Computer ScienceAriel UniversityArielIsrael
  2. 2.Department of Industrial Engineering and ManagementAriel UniversityArielIsrael
  3. 3.Ariel Cyber Innovation CenterAriel UniversityArielIsrael
  4. 4.Data Science and Artificial Intelligence Research CenterAriel UniversityArielIsrael

Personalised recommendations