Conservation of Feature Sub-spaces Across Rootkit Sub-families

  • Prasenjit DasEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 805)


Modern malware detection systems have largely relied on the definition of signatures to characterize malwares to their corresponding malware families. These signatures that characterize malware families are parts of codes and it is believed that families of malwares share commonalities in their signatures. We hypothesize that changes in these signatures generate newer sub-families of malwares. In the present work we have evaluated the signature conservation across two sub-families of rootkits. We have carried out our experiments to establish the fact that features in the rootkit family of malware are conserved. We report that our feature extraction yielded the accuracy of 84.17% using the Naïve Bayes classification algorithm. The results reported in this work reinforce our belief that there are subsets of independent features that discriminate between sub-families but not exhibiting any trend of conservation. We conclude that certain features (if not all) are preserved and discriminate between sub-families.


Data mining Malware Rootkit Classification Clustering Bi-clustering 


  1. 1.
    Szor, P.: The Art of Computer Virus Research and Defense. Addison Wesley, Reading (2005)Google Scholar
  2. 2.
    Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: Proceedings of the 12th USENIX Security Symposium (Security 2003), pp. 169–186. USENIX Association (2003)Google Scholar
  3. 3.
    McGraw, G., Morrisett, G.: Attacking malicious code: a report to the infosec research council. IEEE Soft. 17(5), 33–41 (2000)CrossRefGoogle Scholar
  4. 4.
    Golbeck, J., Hendler, J.: Reputation network analysis for email filtering. In: CEAS (2004)Google Scholar
  5. 5.
    Newman, M.E.J., Forrest, S., Balthrop, J.: Email networks and the spread of computer viruses. Phys. Rev. E 66, 035101 (2002)CrossRefGoogle Scholar
  6. 6.
    Schultz, M., Eskin, E., Zadok, E.: MEF: malicious email filter, a UNIX mail filter that detects malicious windows executables. In: USENIX Annual Technical Conference - FREENIX Track, June 2001Google Scholar
  7. 7.
    Masud, M.M., Khan, L., Thuraisingham, B.: Feature based techniques for auto-detection of novel email worms. In: The Eleventh Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (2007)Google Scholar
  8. 8.
    Singh, S., Estan, C., Varghese, G., Savage, S.: The Earlybird system for real-time detection of unknown worms. Technical report – cs 2003–0761, UCSD (2003)Google Scholar
  9. 9.
    Kim, H.A., Karp, B.: Autograph: toward automated, distributed worm signature detection. In: The Proceedings of the 13th Usenix Security Symposium (Security 2004), San Diego, CA, August 2004Google Scholar
  10. 10.
    Newsome, J., Karp, B., Song, D.: Polygraph: automatically generating signatures for polymorphic worms. In: Proceedings of the IEEE Symposium on Security and Privacy, May 2005Google Scholar
  11. 11.
    Schultz, M., Eskin, E., Zadok, E., Stolfo, S.: Data mining methods for detection of new malicious executables. In: Proceedings of IEEE Symposium on Security and Privacy, pp. 178–184 (2001)Google Scholar
  12. 12.
    Masud, M.M., Khan, L., Thuraisingham, B.: A hybrid model to detect malicious executables. In: Proceedings of 2007 IEEE International Conference on Communications, pp. 1443–1448. IEEE, June 2007Google Scholar
  13. 13.
    Siddiqui, M., Wang, M.C., Lee, J.: Detecting trojans using data mining techniques. In: Hussain, D.M.A., Rajput, A.Q.K., Chowdhry, B.S., Gee, Q. (eds.) IMTIC 2008. CCIS, vol. 20, pp. 400–411. Springer, Heidelberg (2008). Scholar
  14. 14.
    Nataraj, L., Yegneswaran, V., Porras, P., Zhang, J.: A comparative assessment of malware classification using binary texture analysis and dynamic analysis. In: ACM AISec 2011 (2011)Google Scholar
  15. 15.
    Kolter, J.Z., Maloof, M.A.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. 7, 2721–2744 (2006)Google Scholar
  16. 16.
    Perdisci, R., Lanzi, A., Lee, W.: Mcboost: boosting scalability in malware collection and analysis using statistical classification of executables. In: ACSAC 2008 (2008)Google Scholar
  17. 17.
    Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19(4), 639–668 (2011)CrossRefGoogle Scholar
  18. 18.
    Rieck, K., Krueger, T., Dewald, A.: Cujo: efficient detection and prevention of drive-bydownload attacks. In: ACSAC 2010 (2010)Google Scholar
  19. 19.
    Jang, J., Brumley, D., Venkataraman, S.: Bitshred: feature hashing malware for scalable triage and semantic analysis. In: Proceedings of ACM CCS 2011 (2011)Google Scholar
  20. 20.
    Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007). Scholar
  21. 21.
    Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: NDSS 2009 (2009)Google Scholar
  22. 22.
    Hartigan, J.A.: Direct clustering of a data matrix. J. Am. Statis. Assoc. 67(337), 123–129 (1972)CrossRefGoogle Scholar
  23. 23.
    Cheng, Y., Church, G.: Biclustering of expression data. In: International Conference on Intelligent Systems for Molecular Biology (ISMB), Department of Genetics, Harvard Medical School, Boston, MA 02115, USA, vol. 8, pp. 93–103 (1999)Google Scholar
  24. 24.
    Getz, G., Levine, E., Domany, E.: Coupled two-way clustering analysis of gene microarray data. Proc. Nat. Acad. Sci. 97(22), 12079–12084 (2000)CrossRefGoogle Scholar
  25. 25.
    Califano, A., Stolovitzky, G., Tu, Y.: Analysis of gene expression microarrays for phenotype classification. In: Proceedings of International Conference on Intelligent Systems for Molecular Biology (ISMB), vol. 8, pp. 75–85 (2000)Google Scholar
  26. 26.
    Lazzeroni, L., Owen, A.: Plaid models for gene expression data. Statistica Sinica 12, 61–86 (2002)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Segal, E., et al.: Rich probabilistic models for gene expression. Bioinformatics, 17 Suppl 1(1), S243–S252 (2001)CrossRefGoogle Scholar
  28. 28.
    Tang, C., et al.: Interrelated two-way clustering: an unsupervised approach for gene expression data analysis. In: Proceedings - 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering, BIBE 2001, pp. 41–48 (2001)Google Scholar
  29. 29.
    Yang, J., et al.: Delta-clusters: capturing subspace correlation in a large data set. In: Proceedings of 18th International Conference on Data Engineering, p. 12 (2002)Google Scholar
  30. 30.
    Kluger, Y., et al.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 13(4), 703–716 (2003)CrossRefGoogle Scholar
  31. 31.
    Segal, E., Battle, A., Koller, D.: Decomposing gene expression into cellular processes. In: Pacific Symposium on Biocomputing, pp. 89–100 (2003)Google Scholar
  32. 32.
    Liu, J., Wang, W.: OP-cluster: clustering by tendency in high dimensional space. In: Proceedings of Third IEEE International Conference on Data Mining, pp. 187–194 (2003)Google Scholar
  33. 33.
  34. 34.
    Mitchell, T.M.: Machine Learning. McGraw‐Hill, Maidenhead (1997)Google Scholar
  35. 35.
    Quinlan, J.R.: Programs for machine learning. Mach. Learn. 240, 302 (1993)Google Scholar
  36. 36.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  37. 37.
    Hssina, B., Merbouha, A., Ezzikouri, H., Erritali, M.: A comparative study of decision tree ID3 and C4.5. Int. J. Adv. Comput. Sci. Appl. 4(2) (2014)Google Scholar
  38. 38.
    Shabalin, A.A., Weigman, V.J., Perou, C.M., Nobel, A.B.: Finding large average submatrices in high dimensional data. Ann. Appl. Statis. 3, 985–1012 (2009)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of 30th STOC, pp. 604–613 (1998)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Chitkara UniversityBaddiIndia

Personalised recommendations