Detecting Trojans Using Data Mining Techniques

Siddiqui, Muazzam; Wang, Morgan C.; Lee, Joohan

doi:10.1007/978-3-540-89853-5_43

Muazzam Siddiqui⁵,
Morgan C. Wang⁵ &
Joohan Lee⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 20))

Included in the following conference series:

International Multi Topic Conference

1528 Accesses
8 Citations

Abstract

A trojan horse is a program that surreptitiously performs its operation under the guise of a legitimate program. Traditional approaches using signatures to detect these programs pose little danger to new and unseen samples whose signatures are not available. The focus of malware research is shifting from using signature patterns to identifying the malicious behavior displayed by these malwares. This paper presents the novel idea of extracting variable length instruction sequences that can identify trojans from clean programs using data mining techniques. The analysis is facilitated by the program control flow information contained in the instruction sequences. Based on general statistics gathered from these instruction sequences, we formulated the problem as a binary classification problem and built random forest, bagging and support vector machine classifiers. Our approach showed a 94.0% detection rate on novel trojans whose data was not used in the model building process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Download.com, http://www.download.com/
The r project for statistical computing, http://www.r-project.org/
Generic Unpacker Win32, http://www.exetools.com/unpackers.htm
IDA Pro Disassembler, http://www.datarescue.com/idabase/index.htm
PEiD, http://peid.has.it/
UPX the Ultimate Packer for eXecutables, http://www.exeinfo.go.pl/
VMUnpacker, http://dswlab.com/d3.html
VX Heavens, http://vx.netlux.org
Abou-Assaleh, T., Cercone, N., Keselj, V., Sweidan, R.: N-gram-based detection of new malicious code. In: Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - (COMPSAC 2004), vol. 2, pp. 41–42 (2004)
Google Scholar
Arnold, W., Tesauro, G.: Automatically generated win32 heuristic virus detection. In: Virus Bulletin Conference, pp. 123–132 (2000)
Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article Google Scholar
Cohen, F.: Computer Viruses. PhD thesis, University of Southern California (1985)
Google Scholar
Kolter, J.Z., Maloof, M.A.: Learning to detect malicious executables in the wild. In: Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004)
Google Scholar
Rabek, J.C., Khazan, R.I., Lewandowski, S.M., Cunningham, R.K.: Detection of injected, dynamically generated, and obfuscated malicious code. In: Proceedings of the 2003 ACM Workshop on Rapid Malcode, pp. 76–82 (2003)
Google Scholar
Schultz, M.G., Eskin, E., Zadok, E., Stolfo, S.J.: Data mining methods for detection of new malicious executables. In: Proceedings of the IEEE Symposium on Security and Privacy, pp. 38–49 (2001)
Google Scholar
Siddiqui, M., Wang, M.C., Lee, J.: Data mining methods for malware detection using instruction sequences. In: Proceedings of Artificial Intelligence and Applications, AIA 2008. ACTA Press (2008)
Google Scholar
Sung, A.H., Xu, J., Chavez, P., Mukkamala, S.: Static analyzer of vicious executables. In: 20th Annual Computer Security Applications Conference, pp. 326–334 (2004)
Google Scholar
Symantec. Understanding heuristics: Symantec’s bloodhound technology. Technical report, Symantec Corporation (1997)
Google Scholar
Szor, P.: The Art of Computer Virus Research and Defense. Addison Wesley for Symantec Press, New Jersey (2005)
Google Scholar
Webb, A.: Statisitcal Pattern Recognition. Wiley, Chichester (2005)
Google Scholar
Weber, M., Schmid, M., Schatz, M., Geyer, D.: A toolkit for detecting and analyzing malicious software. In: Proceedings of the 18th Annual Computer Security Applications Conference, p. 423 (2002)
Google Scholar
Williams, M.: Anti-trojan and trojan detection with in-kernel digital signature testing of executables. Technical report, NetXSecure NZ Limited (2002)
Google Scholar
Ye, Y., Wang, D., Li, T., Ye, D.: Imds: intelligent malware detection system. In: KDD 2007: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1043–1047. ACM Press, New York (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Central Florida, USA
Muazzam Siddiqui, Morgan C. Wang & Joohan Lee

Authors

Muazzam Siddiqui
View author publications
You can also search for this author in PubMed Google Scholar
Morgan C. Wang
View author publications
You can also search for this author in PubMed Google Scholar
Joohan Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software Engineering & Media Technology, Aalborg University, Niels Bohrs Vej 8, 6700, Esbjerg, Denmark
D. M. Akbar Hussain
Mehran University of Engineering & Technology, Jamshoro, Pakistan
Abdul Qadeer Khan Rajput
Department of Electronics and Telecommunication Engineering, Faculty of Electrical, Electronics & Computer Engineering, Mehran UET, Jamshoro, Pakistan
Bhawani Shankar Chowdhry
Learning Societies Lab, Electronics and Computer Science, University of Southampton, United Kingdom
Quintin Gee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Siddiqui, M., Wang, M.C., Lee, J. (2008). Detecting Trojans Using Data Mining Techniques. In: Hussain, D.M.A., Rajput, A.Q.K., Chowdhry, B.S., Gee, Q. (eds) Wireless Networks, Information Processing and Systems. IMTIC 2008. Communications in Computer and Information Science, vol 20. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89853-5_43

Download citation

DOI: https://doi.org/10.1007/978-3-540-89853-5_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89852-8
Online ISBN: 978-3-540-89853-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics