Learning Attack Features from Static and Dynamic Analysis of Malware

Ravula, Ravinder R.; Liszka, Kathy J.; Chan, Chien-Chung

doi:10.1007/978-3-642-37186-8_7

Learning Attack Features from Static and Dynamic Analysis of Malware

Ravinder R. Ravula⁵,
Kathy J. Liszka⁵ &
Chien-Chung Chan⁵

Conference paper

1289 Accesses
2 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 348))

Abstract

Malware detection is a major challenge in today’s software security profession. Works exist for malware detection based on static analysis such as function length frequency, printable string information, byte sequences, API calls, etc. Some works also applied dynamic analysis using features such as function call arguments, returned values, dynamic API call sequences, etc. In this work, we applied a reverse engineering process to extract static and behavioral features from malware based on an assumption that behavior of a malware can be revealed by executing it and observing its effects on the operating environment. We captured all the activities including registry activity, file system activity, network activity, API Calls made, and DLLs accessed for each executable by running them in an isolated environment. Using the extracted features from the reverse engineering process and static analysis features, we prepared two datasets and applied data mining algorithms to generate classification rules. Essential features are identified by applying Weka’s J48 decision tree classifier to 1103 software samples, 582 malware and 521 benign, collected from the Internet. The performance of all classifiers are evaluated by 5-fold cross validation with 80-20 splits of training sets. Experimental results show that Naïve Bayes classifier has better performance on the smaller data set with 15 reversed features, while J48 has better performance on the data set created from the API Call data set with 141 features. In addition, we applied a rough set based tool BLEM2 to generate and evaluate the identification of reverse engineered features in contrast to decision trees. Preliminary results indicate that BLEM2 rules may provide interesting insights for essential feature identification.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahmed, F., Hameed, H., Shafiq, M.Z., Farooq, M.: Using spatio-temporal information in API calls with machine learning algorithms for malware detection. In: AISec 2009: Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence, pp. 55–62. ACM, New York (2009)
Google Scholar
Burji, S., Liszka, K.J., Chan, C.-C.: Malware Analysis Using Reverse Engineering and Data Mining Tools. In: The 2010 International Conference on System Science and Engineering (ICSSE 2010), pp. 619–624 (July 2010)
Google Scholar
Chan, C.-C., Santhosh, S.: BLEM2: Leaming Bayes’ rules from examples using rough sets. In: Proc. NAFIPS 2003, 22nd Int. Conf. of the North American Fuzzy Information Processing Society, Chicago, Illinois, July 24-26, pp. 187–190 (2003)
Google Scholar
Chan, C.-C., Grzymala-Busse, J.W.: On the two local inductive algorithms: PRISM, and LEM2. Foundations of Computing and Decision Sciences 19(3), 185–203 (1994)
MATH Google Scholar
Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behaviour. In: Proc. ESEC/FS 2007, pp. 5–14 (2007)
Google Scholar
Cohen, F.: Computer Viruses. PhD thesis, University of Southern California (1985)
Google Scholar
Cohen, W.: Learning Trees and Rules with Set-Valued Features. American Association for Artificial Intelligence, AMI (1996)
Google Scholar
Islam, R., Tian, R., Batten, L., Versteeg, S.C.: Classification of Malware Based on String and Function Feature Selection. In: 2010 Second Cybercrime and Trustworthy Computing Workshop, Ballarat, Victoria Australia, July 19-July 20 (2010) ISBN: 978-0-7695-4186-0
Google Scholar
Kang, M.G., Poosankam, P., Yin, H.: Renovo: A hidden code extractor for packed executables. In: Proc. Fifth ACM Workshop on Recurring Malcode, WORM 2007 (November 2007)
Google Scholar
Kolter, J., Maloof, M.: Learning to detect malicious executables in the wild. In: Proc. KDD 2004, pp. 470–478 (2004)
Google Scholar
Komashinskiy, D., Kotenko, I.V.: Malware Detection by Data Mining Techniques Based on Positionally Dependent Features. In: Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, PDP 2010. IEEE Computer Society, Washington, DC (2010) ISBN: 978-0-7695-3939-3
Google Scholar
Mcafee.com (2010a), http://www.mcafee.com/us/resources/reports/rp-quarterly-threat-q3-2010.pdf (retrieved)
Mcafee.com (2010b), http://www.mcafee.com/us/resources/reports/rp-good-decade-for-cybercrime.pdf (retrieved)
Messagelabs.com (2011), http://www.messagelabs.com/mlireport/MLI_2011_01_January_Final_en-us.pdf (retrieved)
Miller, P.: Hexdump. Online publication (2000), http://www.pcug.org.au/millerp/hexdump.html
Pawlak, Z.: Rough sets: basic notion. International Journal of Computer and Information Science 11(15), 344–356 (1982)
MathSciNet Google Scholar
Pawlak, Z.: Flow graphs and intelligent data analysis. Fundamenta Informaticae 64, 369–377 (2005)
MathSciNet MATH Google Scholar
Rozinov, K.: Reverse Code Engineering: An In-Depth Analysis of the Bagle Virus. In: Information Assurance Workshop, IAW 2005. Proceedings from the Sixth Annual IEEE SMC, June 15-17, pp. 380–387 (2005)
Google Scholar
Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data Mining Methods for Detection of New Malicious Executables. In: Proceedings of the 2001 IEEE Symposium on Security and Privacy, pp. 38–49. IEEE Computer Society (2001)
Google Scholar
Skoudis, E.: Malware: Fighting Malicious Code. Prentice Hall (2004)
Google Scholar
Sung, A., Xu, J., Chavez, P., Mukkamala, S.: Static analyzer of vicious executables (save). In: Proc. 20th Annu. Comput. Security Appl. Conf., pp. 326–334 (2004)
Google Scholar
Wang, T.-Y., Wu, C.-H., Hsieh, C.-C.: A Virus Prevention Model Based on Static Analysis and Data Mining Methods. In: Proceedings of the 2008 IEEE 8th International Conference on Computer and Information Technology Workshops, CITWORKSHOPS 2008, pp. 288–293 (2008)
Google Scholar
Wang, T.-Y., Wu, C.-H., Hsieh, C.-C.: Detecting Unknown Malicious Executables Using Portable Executable Headers. In: Fifth International Joint Conference on INC, IMS and IDC, pp. 278–284 (2009)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. (2005) ISBN: 0-12-088407-0
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Akron, Akron, OH, U.S.A.
Ravinder R. Ravula, Kathy J. Liszka & Chien-Chung Chan

Authors

Ravinder R. Ravula
View author publications
You can also search for this author in PubMed Google Scholar
Kathy J. Liszka
View author publications
You can also search for this author in PubMed Google Scholar
Chien-Chung Chan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IST - Technical University of Lisbon, Av.Rovisco Pais, 1, 1049-001, Lisbon, Portugal
Ana Fred
Delft University of Technology, Mekelweg 4, 2628 CD, Delft, The Netherlands
Jan L. G. Dietz
Informatics Research Centre, Henley Business School, University of Reading, RG6 6UD, Reading, UK
Kecheng Liu
INSTICC and IPS, Estefanilha, Setúbal, Portugal
Joaquim Filipe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ravula, R.R., Liszka, K.J., Chan, CC. (2013). Learning Attack Features from Static and Dynamic Analysis of Malware. In: Fred, A., Dietz, J.L.G., Liu, K., Filipe, J. (eds) Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2011. Communications in Computer and Information Science, vol 348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37186-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-37186-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37185-1
Online ISBN: 978-3-642-37186-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics