Malware Phylogenetics Based on the Multiview Graphical Lasso

  • Blake Anderson
  • Terran Lane
  • Curtis Hash
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8819)


Malware phylogenetics has gained a lot of traction over the past several years. More recently, researchers have begun looking at directed acyclic graphs (DAG) to model the evolutionary relationships between samples of malware. Phylogenetic graphs offer analysts a better understanding of how malware has evolved by clearly illustrating the lineage of a given family. In this paper, we present a novel algorithm based on graphical lasso. We extend graphical lasso to incorporate multiple views, both static and dynamic, of malware. For each program family, a convex combination of the views is found such that the objective function of graphical lasso is maximized. Learning the weights of each view on a per-family basis, as opposed to treating all views as an extended feature vector, is essential in the malware domain because different families employ different obfuscation strategies which limits the information of different views. We demonstrate results on three malicious families and two benign families where the ground truth is known.


Gaussian Graphical Models Malware Multiview Learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anderson, B., Quist, D., Neil, J., Storlie, C., Lane, T.: Graph-Based Malware Detection using Dynamic Analysis, pp. 1–12. Springer, Paris (2011)Google Scholar
  2. 2.
    Anderson, B., Storlie, C., Lane, T.: Improving Malware Classification: Bridging the Static/Dynamic Gap. In: Proceedings of the Fifth ACM Workshop on Security and Artificial Intelligence, pp. 3–14. ACM (2012)Google Scholar
  3. 3.
    Anderson, B., Storlie, C., Lane, T.: Multiple Kernel Learning Clustering with an Application to Malware. In: IEEE Twelfth International Conference on Data Mining, pp. 804–809. IEEE (2012)Google Scholar
  4. 4.
    Bilar, D.: Opcodes as Predictor for Malware. International Journal of Electronic Security and Digital Forensics 1, 156–168 (2007)CrossRefGoogle Scholar
  5. 5.
    Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus (2006)Google Scholar
  6. 6.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, New York (2004)Google Scholar
  7. 7.
    Danaher, P., Wang, P., Witten, D.M.: The Joint Graphical Lasso for Inverse Covariance Estimation Across Multiple Classes. ArXiv e-prints (Nov 2011)Google Scholar
  8. 8.
    Darmetko, C., Jilcott, S., Everett, J.: Inferring Accurate Histories of Malware Evolution from Structural Evidence. In: The Twenty-Sixth International FLAIRS Conference (2013)Google Scholar
  9. 9.
    Friedman, J., Hastie, T., Tibshirani, R.: Sparse Inverse Covariance Estimation with the Graphical Lasso. Biostatistics 9(3), 432–441 (2008)CrossRefzbMATHGoogle Scholar
  10. 10.
    Guo, J., Levina, E., Michailidis, G., Zhu, J.: Joint Estimation of Multiple Graphical Models. Biometrika (2011)Google Scholar
  11. 11.
    Gupta, A., Kuppili, P., Akella, A., Barford, P.: An Empirical Study of Malware Evolution. In: First International Communication Systems and Networks and Workshops, pp. 1–10. IEEE (2009)Google Scholar
  12. 12.
    Hofmeyr, S.A., Forrest, S., Somayaji, A.: Intrusion Detection Using Sequences of System Calls. Journal of Computer Security 6(3), 151–180 (1998)Google Scholar
  13. 13.
    Jang, J., Woo, M., Brumley, D.: Towards Automatic Software Lineage Inference. In: Proceedings of the Twenty-Second USENIX Conference on Security, pp. 81–96. USENIX Association (2013)Google Scholar
  14. 14.
    Kashima, H., Tsuda, K., Inokuchi, A.: Kernels for Graphs. MIT Press (2004)Google Scholar
  15. 15.
    Kaspersky Lab Report: The Bagle Botnet, (accessed September 17, 2013)
  16. 16.
    Kolbitsch, C., Comparetti, P.M., Kruegel, C., Kirda, E., Zhou, X.Y., Wang, X.: Effective and Efficient Malware Detection at the End Host. In: Proceedings of the Eighteeth USENIX Security Symposium, pp. 351–366 (2009)Google Scholar
  17. 17.
    Kolter, J.Z., Maloof, M.A.: Learning to Detect and Classify Malicious Executables in the Wild. The Journal of Machine Learning Research 7, 2721–2744 (2006), MathSciNetzbMATHGoogle Scholar
  18. 18.
    Kruegel, C., Kirda, E., Mutz, D., Robertson, W., Vigna, G.: Polymorphic Worm Detection Using Structural Information of Executables. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 207–226. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Mineserver, (accessed September 17, 2013)
  20. 20.
    NetworkMiner, (accessed September 17, 2013)
  21. 21.
    Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press (2002)Google Scholar
  22. 22.
    Symantec Bagle Security Report, (accessed September 17, 2013)
  23. 23.
    Symantec Koobface Security Report, (accessed September 17, 2013)
  24. 24.
    Symantec Mytob Security Report, (accessed September 17, 2013)
  25. 25.
    Wagener, G., State, R., Dulaunoy, A.: Malware Behaviour Analysis. Journal in Computer Virology 4(4), 279–287 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Blake Anderson
    • 1
  • Terran Lane
    • 2
  • Curtis Hash
    • 1
  1. 1.Los Alamos National LaboratoryUSA
  2. 2.Google, Inc.USA

Personalised recommendations