MtNet: A Multi-Task Neural Network for Dynamic Malware Classification

  • Wenyi Huang
  • Jack W. StokesEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9721)


In this paper, we propose a new multi-task, deep learning architecture for malware classification for the binary (i.e. malware versus benign) malware classification task. All models are trained with data extracted from dynamic analysis of malicious and benign files. For the first time, we see improvements using multiple layers in a deep neural network architecture for malware classification. The system is trained on 4.5 million files and tested on a holdout test set of 2 million files which is the largest study to date. To achieve a binary classification error rate of 0.358 %, the objective functions for the binary classification task and malware family classification task are combined in the multi-task architecture. In addition, we propose a standard (i.e. non multi-task) malware family classification architecture which also achieves a malware family classification error rate of 2.94 %.



The authors would like to thank Mady Marinescu with helping in the data collection. We also thank our shepherd Juan Tapiador and the anonymous reviewers for their very valuable feedback.


  1. 1.
    Agarwal, A., Akchurin, E., Basoglu, C., Chen, G., Cyphers, S., Droppo, J., Eversole, A., Guenter, B., Hillebrand, M., Hoens, R., Huang, X., Huang, Z., Ivanov, V., Kamenev, A., Kranen, P., Kuchaiev, O., Manousek, W., May, A., Mitra, B., Nano, O., Navarro, G., Orlov, A., Padmilac, M., Parthasarathi, H., Peng, B., Reznichenko, A., Seide, F., Seltzer, M.L., Slaney, M., Stolcke, A., Wang, Y., Wang, H., Yao, K., Yu, D., Zhang, Y., Zweig, G.: An introduction to computational networks and the computational network toolkit. Technical report MSR-TR-2014-112.
  2. 2.
    Atkison, T.: Applying randomized projection to aid prediction algorithms in detecting high-dimensional rogue application. In: Proceedings of the Annual Southeast Regional Conference (ACMSE) (2009)Google Scholar
  3. 3.
    Balzarotti, D., Cova, M., Karlberger, C., Kruegel, C., Kirda, E., Vigna, G.: Efficient detection of split personalities in malware. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2010)Google Scholar
  4. 4.
    Bayer, U., Kruegel, C., Kirda, E.: TTAnalyze: A tool for analyzing malware. In: Proceedings of 15th Annual Conference of the European Institute for Computer Antivirus Research (EICAR) (2006)Google Scholar
  5. 5.
    Benchea, R., Gavriluţ, D.T.: Combining restricted boltzmann machine and one side perceptron for malware detection. In: Hernandez, N., Jäschke, R., Croitoru, M. (eds.) ICCS 2014. LNCS, vol. 8577, pp. 93–103. Springer, Heidelberg (2014)Google Scholar
  6. 6.
    Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Dahl, G.E., Stokes, J.W., Deng, L., Yu, D.: Large-scale malware classification using random projections and neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3422–3426. IEEE (2013)Google Scholar
  8. 8.
    Hinton, G., Deng, L., Yu, D., rahman Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Dahl, G., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. In: IEEE Signal Processing Magazine, vol. 29, pp. 82–97 (2012)Google Scholar
  9. 9.
    Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kolen, J.F., Kremer, S.C. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, Wiley-IEEE Press (2001)Google Scholar
  10. 10.
    Idika, N., Mathur, A.P.: A survey of malware detection techniques. Technical report, Purdue University.
  11. 11.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part IV. LNCS, vol. 8692, pp. 512–528. Springer, Heidelberg (2014)Google Scholar
  12. 12.
    Kephart, J.O.: A biologically inspired immune system for computers. In: Proceedings of the Fourth International Workshop on the Synthesis and Simulation of Living Systems, pp. 130–139. MIT Press (1994)Google Scholar
  13. 13.
    Kolter, J., Maloof, M.: Learning to detect and classify malicious executables in the wild. J. Mach. Learn. Res. (JMLR) 7, 2721–2744 (2006)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)Google Scholar
  15. 15.
    Li, P., Hastie, T.J., Church, K.W.: Very sparse random projections. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ICDM), pp. 287–296 (2006)Google Scholar
  16. 16.
    Lopez, M.: 27% of all recorded malware appeared in 2015 (2016).
  17. 17.
    Manning, C.D., Raghavan, P., Schutze, H.: An Introduction to Information Retrieval. Cambridge University Press, New York (2009)zbMATHGoogle Scholar
  18. 18.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 807–814 (2010)Google Scholar
  19. 19.
    Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swamix, A.: The limitations of deep learning in adversarial systems. In: IEEE European Symposium on Security and Privacy (2016)Google Scholar
  20. 20.
    Pascanu, R., Stokes, J.W., Sanossian, H., Marinescu, M., Thomas, A.: Malware classification with recurrent networks. In: Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1916–1920. IEEE (2015)Google Scholar
  21. 21.
    Saxe, J., Berlin, K.: Deep neural network based malware detection using two dimensional binary program features. arXiv preprint (2015). arXiv:1508.03096v2
  22. 22.
    Schultz, M., Eskin, E., Zadok, E., Stolfo, S.: Data mining methods of detection of new malicious executables. In: Proceedings of the 2001 IEEE Symposium on Security and Privacy (SP), pp. 38–49. IEEE Press, New York (2001)Google Scholar
  23. 23.
    Seltzer, M.L., Droppo, J.: Multi-task learning in deep neural networks for improved phoneme recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2013)Google Scholar
  24. 24.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014). MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Information Sciences and TechnologyPennsylvania State UniversityUniversity ParkUSA
  2. 2.Microsoft ResearchRedmondUSA

Personalised recommendations