Skip to main content

Learning Attack Features from Static and Dynamic Analysis of Malware

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 348))

Abstract

Malware detection is a major challenge in today’s software security profession. Works exist for malware detection based on static analysis such as function length frequency, printable string information, byte sequences, API calls, etc. Some works also applied dynamic analysis using features such as function call arguments, returned values, dynamic API call sequences, etc. In this work, we applied a reverse engineering process to extract static and behavioral features from malware based on an assumption that behavior of a malware can be revealed by executing it and observing its effects on the operating environment. We captured all the activities including registry activity, file system activity, network activity, API Calls made, and DLLs accessed for each executable by running them in an isolated environment. Using the extracted features from the reverse engineering process and static analysis features, we prepared two datasets and applied data mining algorithms to generate classification rules. Essential features are identified by applying Weka’s J48 decision tree classifier to 1103 software samples, 582 malware and 521 benign, collected from the Internet. The performance of all classifiers are evaluated by 5-fold cross validation with 80-20 splits of training sets. Experimental results show that Naïve Bayes classifier has better performance on the smaller data set with 15 reversed features, while J48 has better performance on the data set created from the API Call data set with 141 features. In addition, we applied a rough set based tool BLEM2 to generate and evaluate the identification of reverse engineered features in contrast to decision trees. Preliminary results indicate that BLEM2 rules may provide interesting insights for essential feature identification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ahmed, F., Hameed, H., Shafiq, M.Z., Farooq, M.: Using spatio-temporal information in API calls with machine learning algorithms for malware detection. In: AISec 2009: Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence, pp. 55–62. ACM, New York (2009)

    Google Scholar 

  2. Burji, S., Liszka, K.J., Chan, C.-C.: Malware Analysis Using Reverse Engineering and Data Mining Tools. In: The 2010 International Conference on System Science and Engineering (ICSSE 2010), pp. 619–624 (July 2010)

    Google Scholar 

  3. Chan, C.-C., Santhosh, S.: BLEM2: Leaming Bayes’ rules from examples using rough sets. In: Proc. NAFIPS 2003, 22nd Int. Conf. of the North American Fuzzy Information Processing Society, Chicago, Illinois, July 24-26, pp. 187–190 (2003)

    Google Scholar 

  4. Chan, C.-C., Grzymala-Busse, J.W.: On the two local inductive algorithms: PRISM, and LEM2. Foundations of Computing and Decision Sciences 19(3), 185–203 (1994)

    MATH  Google Scholar 

  5. Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behaviour. In: Proc. ESEC/FS 2007, pp. 5–14 (2007)

    Google Scholar 

  6. Cohen, F.: Computer Viruses. PhD thesis, University of Southern California (1985)

    Google Scholar 

  7. Cohen, W.: Learning Trees and Rules with Set-Valued Features. American Association for Artificial Intelligence, AMI (1996)

    Google Scholar 

  8. Islam, R., Tian, R., Batten, L., Versteeg, S.C.: Classification of Malware Based on String and Function Feature Selection. In: 2010 Second Cybercrime and Trustworthy Computing Workshop, Ballarat, Victoria Australia, July 19-July 20 (2010) ISBN: 978-0-7695-4186-0

    Google Scholar 

  9. Kang, M.G., Poosankam, P., Yin, H.: Renovo: A hidden code extractor for packed executables. In: Proc. Fifth ACM Workshop on Recurring Malcode, WORM 2007 (November 2007)

    Google Scholar 

  10. Kolter, J., Maloof, M.: Learning to detect malicious executables in the wild. In: Proc. KDD 2004, pp. 470–478 (2004)

    Google Scholar 

  11. Komashinskiy, D., Kotenko, I.V.: Malware Detection by Data Mining Techniques Based on Positionally Dependent Features. In: Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP 2010. IEEE Computer Society, Washington, DC (2010) ISBN: 978-0-7695-3939-3

    Google Scholar 

  12. Mcafee.com (2010a), http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q3-2010.pdf (retrieved)

  13. Mcafee.com (2010b), http://www.mcafee.com/us/resources/reports/rp-good-decade-for-cybercrime.pdf (retrieved)

  14. Messagelabs.com (2011), http://www.messagelabs.com/mlireport/MLI_2011_01_January_Final_en-us.pdf (retrieved)

  15. Miller, P.: Hexdump. Online publication (2000), http://www.pcug.org.au/millerp/hexdump.html

  16. Pawlak, Z.: Rough sets: basic notion. International Journal of Computer and Information Science 11(15), 344–356 (1982)

    MathSciNet  Google Scholar 

  17. Pawlak, Z.: Flow graphs and intelligent data analysis. Fundamenta Informaticae 64, 369–377 (2005)

    MathSciNet  MATH  Google Scholar 

  18. Rozinov, K.: Reverse Code Engineering: An In-Depth Analysis of the Bagle Virus. In: Information Assurance Workshop, IAW 2005. Proceedings from the Sixth Annual IEEE SMC, June 15-17, pp. 380–387 (2005)

    Google Scholar 

  19. Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data Mining Methods for Detection of New Malicious Executables. In: Proceedings of the 2001 IEEE Symposium on Security and Privacy, pp. 38–49. IEEE Computer Society (2001)

    Google Scholar 

  20. Skoudis, E.: Malware: Fighting Malicious Code. Prentice Hall (2004)

    Google Scholar 

  21. Sung, A., Xu, J., Chavez, P., Mukkamala, S.: Static analyzer of vicious executables (save). In: Proc. 20th Annu. Comput. Security Appl. Conf., pp. 326–334 (2004)

    Google Scholar 

  22. Wang, T.-Y., Wu, C.-H., Hsieh, C.-C.: A Virus Prevention Model Based on Static Analysis and Data Mining Methods. In: Proceedings of the 2008 IEEE 8th International Conference on Computer and Information Technology Workshops, CITWORKSHOPS 2008, pp. 288–293 (2008)

    Google Scholar 

  23. Wang, T.-Y., Wu, C.-H., Hsieh, C.-C.: Detecting Unknown Malicious Executables Using Portable Executable Headers. In: Fifth International Joint Conference on INC, IMS and IDC, pp. 278–284 (2009)

    Google Scholar 

  24. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. (2005) ISBN: 0-12-088407-0

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ravula, R.R., Liszka, K.J., Chan, CC. (2013). Learning Attack Features from Static and Dynamic Analysis of Malware. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2011. Communications in Computer and Information Science, vol 348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37186-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-37186-8_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-37185-1

  • Online ISBN: 978-3-642-37186-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics