Deployable Classifiers for Malware Detection

Singh, Anshuman; Singh, Sumi; Walenstein, Andrew; Lakhotia, Arun

doi:10.1007/978-3-642-29166-1_34

Anshuman Singh⁷,
Sumi Singh⁷,
Andrew Walenstein⁷ &
…
Arun Lakhotia⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 285))

Included in the following conference series:

International Conference on Information Systems, Technology and Management

1189 Accesses

Abstract

The application of machine learning methods to malware detection has opened up possibilities of generating large number of classifiers that use different kinds of features and learning algorithms. A straightforward way to select the best classifier is to pick the one with best holdout or cross-validation performance. Cross-validation or holdout gives a point estimate of generalization performance that varies with training data and learning algorithm parameters. We propose a classifier selection criterion that considers bounds on the performance estimates using confidence intervals in conjunction with a performance target. Performance targets are commonly used in practice, particularly in security applications like malware detection, for classifier selection. The proposed criterion, called deployability, selects a classifier as deployable if the cost target lies within or above the classifier’s expected cost confidence interval. We conducted an experiment with machine learning based malware detectors to evaluate the criterion. We found that for a given confidence level and cost target, even the classifier with least expected cost may not be deployable and classifiers with higher expected cost may also be deployable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brazdil, P., Gama, J., Henery, B.: Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 83–102. Springer, Heidelberg (1994)
Chapter Google Scholar
Elovici, Y., Braha, D.: A decision-theoretic approach to data mining. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 33(1), 42–51 (2003)
Article Google Scholar
Gaffney Jr., J., Ulvila, J.: Evaluation of intrusion detectors: A decision theory approach. In: Proc. of IEEE Symposium on Security and Privacy, pp. 50–61 (2001)
Google Scholar
Gama, J., Brazdil, P.: Characterization of classification algorithms. In: Progress in Artificial Intelligence, pp. 189–200 (1995)
Google Scholar
Han, J., Kamber, M.: Data mining: concepts and techniques. Morgan Kaufmann (2006)
Google Scholar
Kleinberg, J., Papadimitriou, C., Raghavan, P.: A microeconomic view of data mining. Data Mining and Knowledge Discovery 2(4), 311–324 (1998)
Article Google Scholar
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence, vol. 14, pp. 1137–1145 (1995)
Google Scholar
Kolter, J., Maloof, M.: Learning to detect malicious executables in the wild. In: Proc. of the Tenth ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pp. 470–478 (2004)
Google Scholar
Kolter, J., Maloof, M.: Learning to detect and classify malicious executables in the wild. The Journal of Machine Learning Research 7, 2721–2744 (2006)
MathSciNet MATH Google Scholar
Miller, I., Miller, M.: John E. Freund’s mathematical statistics with applications. Prentice Hall (2004)
Google Scholar
Moskovitch, R., Feher, C., Tzachar, N., Berger, E., Gitelman, M., Dolev, S., Elovici, Y.: Unknown Malcode Detection Using OPCODE Representation. In: Ortiz-Arroyo, D., Larsen, H.L., Zeng, D.D., Hicks, D., Wagner, G. (eds.) EuroIsI 2008. LNCS, vol. 5376, pp. 204–215. Springer, Heidelberg (2008)
Chapter Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Louisiana at Lafayette, USA
Anshuman Singh, Sumi Singh, Andrew Walenstein & Arun Lakhotia

Authors

Anshuman Singh
View author publications
You can also search for this author in PubMed Google Scholar
Sumi Singh
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Walenstein
View author publications
You can also search for this author in PubMed Google Scholar
Arun Lakhotia
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science, College of Engineering and Science, Louisiana Tech University, 71272, Ruston, LA, USA
Sumeet Dua
Department of Information Systems, College of Engineering and Information, Technology, UMBC, 1000 Hilltop Circle, 2125, Baltimore, MD, USA
Aryya Gangopadhyay
Department of Computer Science, The University of Manitoba, Winnipeg, MB, Canada
Parimala Thulasiraman
ISTI - CNR, Pisa, Italy
Umberto Straccia
Faculty of Computer Science, Dalhousie University Halifax, B3H 1W5, Nova Scotia, Canada
Michael Shepherd
Faculty of Media: Media Systems, Bauhaus University Weimar, 99421, Weimar, Germany
Benno Stein

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, A., Singh, S., Walenstein, A., Lakhotia, A. (2012). Deployable Classifiers for Malware Detection. In: Dua, S., Gangopadhyay, A., Thulasiraman, P., Straccia, U., Shepherd, M., Stein, B. (eds) Information Systems, Technology and Management. ICISTM 2012. Communications in Computer and Information Science, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29166-1_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-29166-1_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29165-4
Online ISBN: 978-3-642-29166-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics