Malicious Codes Detection Based on Ensemble Learning
As malicious codes become more complex and sophisticated, the scanning detection method is no longer able to detect various forms of viruses effectively. In this paper, we explore solutions based on multiple classifiers fusion and not strictly dependent on certain malicious code. Motivated by the standard signature-based technique for detecting viruses, we explore the idea of automatically detecting malicious code using the n-gram analysis. After selecting features based on information gain, the probabilistic neural network is used in the process of building and testing the proposed multi-classifiers system. Each one of the individual classifiers is used to produce classification evidences. Then these evidences are combined by the Dempster-Shafer combination rules to form the final classification results for new malicious code. Experimental results produced by the proposed detection engine shows improvement compared to the classification results produced by the individual classifiers.
KeywordsInformation Gain Individual Classifier Probabilistic Neural Network Ensemble Learn Malicious Code
Unable to display preview. Download preview PDF.
- 1.Kephart, J., Arnold, W.: Automatic Extraction of Computer Virus Signatures. In: Proceedings of the 4th Virus Bulletin International Conference, Abingdon, pp. 178–184 (1994)Google Scholar
- 6.Jurafsky, D., James, H.: Speech and Language Processing. Prentice-Hall, New York (2000)Google Scholar
- 13.Barnett, J.A.: Computational methods for a mathematical theory of evidence. In: Proceedings of 7th Int. Joint Conf. Artificial Intelligence. Vancouver, BC, pp. 868–875 (1981)Google Scholar
- 14.Vx heavens: http://www.vx.netlux.org
- 15.Perl package Text: Ngrams: http://search.cpan.org/author/vlado/text-ngrams-0.03/ngrams.pm
- 16.Mathworks (ed.): Neural Network Toolbox User’s Guide (version 4). The Mathworks, Inc. Ntick, Massachussets (2001)Google Scholar