A Machine Learning Framework for Studying Domain Generation Algorithm (DGA)-Based Malware

  • Tommy ChinEmail author
  • Kaiqi Xiong
  • Chengbin Hu
  • Yi Li
Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 254)


Malware or threat actors use a Command and Control (C2) environment to proliferate and manage an attack. In a sophisticated attack, a threat actor often employs a Domain Generation Algorithm (DGA) to cycle the network location in which malware communicates with C2. Network security controls such as blacklisting, implementing a DNS sinkhole, or inserting a firewall rule is a vital asset to an organization’s security posture. However, all of them are typically ineffective against a DGA. In this paper, we propose a machine learning framework for identifying and clustering domain names to circumvent threats from a DGA. We collect a real-time threat intelligent feed over a six month period where all domains have threats on the public Internet at the time of collection. We then apply the proposed machine learning framework to study DGA-based malware. The proposed framework contains a two-level model, which consists of classification and clustering is used to first detect DGA domains and then identify the DGA of those domains. Our extensive experimental results demonstrate the accuracy of the proposed framework. To be precise, we achieve accuracies of 95.14% for the first-level classification and 92.45% for the second-level clustering, respectively.


Malware Domain Generation Algorithm Machine learning Security Networking 



We acknowledge National Science Foundation (NSF) to partially sponsor the research work under grants #1633978, #1620871, #1636622, #1651280, and #1620862, and BBN/GPO project #1936 through an NSF/CNS grant. We also thank the Florida Center for Cybersecurity (FC2) located at the University of South Florida (USF) to support the research through its funding that is open to all institutions in the State University System of Florida.

The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied of NSF, FC2, and USF.


  1. 1.
    Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Zamboni, D. (ed.) DIMVA 2008. LNCS, vol. 5137, pp. 108–125. Springer, Heidelberg (2008). Scholar
  2. 2.
    Chin, T., Xiong, K., Rahouti, M.: SDN-based kernel modular countermeasure for intrusion detection. In: Lin, X., Ghorbani, A., Ren, K., Zhu, S., Zhang, A. (eds.) SecureComm 2017. LNICST, vol. 238, pp. 270–290. Springer, Cham (2018). Scholar
  3. 3.
    Ghosh, U., et al.: An SDN based framework for guaranteeing security and performance in information-centric cloud networks. In: Proceedings of the 11th IEEE International Conference on Cloud Computing (IEEE Cloud) (2017)Google Scholar
  4. 4.
    Khancome, C., Boonjing, V., Chanvarasuth, P.: A two-hashing table multiple string pattern matching algorithm. In: Tenth International Conference on Information Technology: New Generations (ITNG), pp. 696–701. IEEE (2013)Google Scholar
  5. 5.
    Schiavoni, S., Maggi, F., Cavallaro, L., Zanero, S.: Phoenix: DGA-based botnet tracking and intelligence. In: Dietrich, S. (ed.) DIMVA 2014. LNCS, vol. 8550, pp. 192–211. Springer, Cham (2014). Scholar
  6. 6.
    Sood, A.K., Zeadally, S.: A taxonomy of domain-generation algorithms. IEEE Secur. Priv. 14(4), 46–53 (2016)CrossRefGoogle Scholar
  7. 7.
    Xiong, K.: Multiple priority customer service guarantees in cluster computing. In: Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–12. IEEE (2009)Google Scholar
  8. 8.
    Xiong, K.: Resource optimization and security for cloud services. Wiley, Hoboken (2014)CrossRefGoogle Scholar
  9. 9.
    Xiong, K.: Resource optimization and security for distributed computing (2008).
  10. 10.
    Mark, B., et al.: GENI: a federated testbed for innovative network experiments. Comput. Netw. 61, 5–23 (2014)CrossRefGoogle Scholar
  11. 11.
    Xiong, K., Chen, X.: Ensuring cloud service guarantees via service level agreement (SLA)-based resource allocation. In: Proceedings of the IEEE 35th International Conference on Distributed Computing Systems Workshops, ICDCS Workshops, pp. 35–41. IEEE (2015)Google Scholar
  12. 12.
    Chin, T., Xiong, K.: Dynamic generation containment systems (DGCS): A moving target defense approach. In: Proceedings of the 3rd International Workshop on Emerging Ideas and Trends in Engineering of Cyber-Physical Systems (EITEC), vol. 00, pp. 11–16, April 2016Google Scholar
  13. 13.
    Sornalakshmi, K.: Detection of DoS attack and zero day threat with SIEM. In: International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1–7. IEEE (2017)Google Scholar
  14. 14.
    Yadav, S., Reddy, A.L.N.: Winning with DNS failures: strategies for faster botnet detection. In: Rajarajan, M., Piper, F., Wang, H., Kesidis, G. (eds.) SecureComm 2011. LNICST, vol. 96, pp. 446–459. Springer, Heidelberg (2012). Scholar
  15. 15.
    Yadav, S., Reddy, A.K.K., Reddy, A.N., Ranjan, S.: Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. IEEE/ACM Trans. Netw. 20(5), 1663–1677 (2012)CrossRefGoogle Scholar
  16. 16.
    Guo, F., Ferrie, P., Chiueh, T.: A study of the packer problem and its solutions. In: Lippmann, R., Kirda, E., Trachtenberg, A. (eds.) RAID 2008. LNCS, vol. 5230, pp. 98–115. Springer, Heidelberg (2008). Scholar
  17. 17.
    Holz, T., Steiner, M., Dahl, F., Biersack, E., Freiling, F.C., et al.: Measurements and mitigation of peer-to-peer-based botnets: a case study on storm worm. LEET 8(1), 1–9 (2008)Google Scholar
  18. 18.
    Zhang, L., Yu, S., Wu, D., Watters, P.: A survey on latest botnet attack and defense. In: IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 53–60. IEEE (2011)Google Scholar
  19. 19.
    Barabosch, T., Wichmann, A., Leder, F., Gerhards-Padilla, E.: Automatic extraction of domain name generation algorithms from current malware. In: Proceedings of NATO Symposium IST-111 on Information Assurance and Cyber Defense, Koblenz, Germany (2012)Google Scholar
  20. 20.
    Gardiner, J., Nagaraja, S.: On the security of machine learning in malware c&c detection: a survey. ACM Comput. Surv. (CSUR) 49(3), 59 (2016)CrossRefGoogle Scholar
  21. 21.
    Ahluwalia, A., Traore, I., Ganame, K., Agarwal, N.: Detecting broad length algorithmically generated domains. In: Traore, I., Woungang, I., Awad, A. (eds.) ISDDC 2017. LNCS, vol. 10618, pp. 19–34. Springer, Cham (2017). Scholar
  22. 22.
    Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1245–1254. ACM (2009)Google Scholar
  23. 23.
    Antonakakis, M., et al.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: USENIX security symposium, vol. 12 (2012)Google Scholar
  24. 24.
    Wang, W., Shirley, K.: Breaking bad: detecting malicious domains using word segmentation. arXiv preprint arXiv:1506.04111 (2015)
  25. 25.
    McGrath, D.K., Gupta, M.: Behind phishing: an examination of phisher modi operandi. LEET 8, 4 (2008)Google Scholar
  26. 26.
    Mowbray, M., Hagen, J.: Finding domain-generation algorithms by looking at length distribution. In: IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 395–400. IEEE (2014)Google Scholar
  27. 27.
    Shabtai, A., et al.: Detection of malicious code by applying machine learning classifiers on static features: a state-of-the-art survey. Information Security Technical Report (2009)Google Scholar
  28. 28.
    Sharifnya, R., Abadi, M.: A novel reputation system to detect DGA-based botnets. In: 3th International eConference on Computer and Knowledge Engineering (ICCKE), pp. 417–423. IEEE (2013)Google Scholar
  29. 29.
    Woodbridge, J., Anderson, H.S., Ahuja, A., Grant, D.: Predicting domain generation algorithms with long short-term memory networks. arXiv preprint arXiv:1611.00791 (2016)
  30. 30.
    Xu, W., Sanders, K., Zhang, Y.: We know it before you do: predicting malicious domains. In: Virus Bulletin Conference (2014)Google Scholar
  31. 31.
    Yu, B., Gray, D.L., Pan, J., De Cock, M., Nascimento, A.C.: Inline DGA detection with deep networks. In: IEEE International Conference on Data Mining Workshops (ICDMW), pp. 683–692. IEEE (2017)Google Scholar
  32. 32.
    Saxe, J., Berlin, K.: eXpose: A character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys. arXiv preprint arXiv:1702.08568 (2017)
  33. 33.
    Bambenek: OSINT feeds from bambenek consulting. Bambenek ConsultingGoogle Scholar
  34. 34.
    Yang, L., Karim, R., Ganapathy, V., Smith, R.: Fast, memory-efficient regular expression matching with NFA-OBDDs. Comput. Netw. 55(15), 3376–3393 (2011)CrossRefGoogle Scholar
  35. 35.
    Kührer, M., Rossow, C., Holz, T.: Paint it black: evaluating the effectiveness of malware blacklists. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 1–21. Springer, Cham (2014). Scholar
  36. 36.
    JBT Organization: Domain feed of known DGA domains (2017)Google Scholar
  37. 37.
    Jarvis, K.: Cryptolocker ransomware. Viitattu 20, 2014 (2013)Google Scholar
  38. 38.
    Chaignon, P.: A collection of known domain generation algorithms (2014)Google Scholar
  39. 39.
    Technologies: Top million websites & TLDs (2016)Google Scholar
  40. 40.
    Chin, T., Mountrouidou, X., Li, X., Xiong, K.: An SDN-supported collaborative approach for DDoS flooding detection and containment. In: 2015 IEEE Military Communications Conference, MILCOM 2015, pp. 659–664. IEEE (2015)Google Scholar
  41. 41.
    Lenkala, S.R., Shetty, S., Xiong, K.: Security risk assessment of cloud carrier. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 442–449. IEEE (2013)Google Scholar
  42. 42.
    Xiong, K., Perros, H.: SLA-based service composition in enterprise computing. In: 16th International Workshop on Quality of Service, IWQoS 2008, pp. 30–39. IEEE (2008)Google Scholar

Copyright information

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2018

Authors and Affiliations

  1. 1.Department of Computing SecurityRochester Institute of TechnologyRochesterUSA
  2. 2.Florida Center for CybersecurityUniversity of South FloridaTampaUSA

Personalised recommendations