Scalable Telemetry Classification for Automated Malware Detection

Stokes, Jack W.; Platt, John C.; Wang, Helen J.; Faulhaber, Joe; Keller, Jonathan; Marinescu, Mady; Thomas, Anil; Gheorghescu, Marius

doi:10.1007/978-3-642-33167-1_45

Jack W. Stokes¹⁹,
John C. Platt¹⁹,
Helen J. Wang¹⁹,
Joe Faulhaber²⁰,
Jonathan Keller²⁰,
Mady Marinescu²⁰,
Anil Thomas²⁰ &
…
Marius Gheorghescu²⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 7459))

Included in the following conference series:

European Symposium on Research in Computer Security

3850 Accesses
3 Citations

Abstract

Industry reports and blogs have estimated the amount of malware based on known malicious files. This paper extends this analysis to the amount of unknown malware. The study is based on 26.7 million files referenced in telemetry reports from 50 million computers running commercial anti-malware (AM) products. To estimate the undetected malware, a classifier predicts the underlying nature of unknown files recorded in the telemetry reports. The telemetry classifier predicts that 69.6% (4.27 million) of the unknown files are malicious. Assuming the unknown files predicted to be malicious by the classifier are malware, the telemetry classifier also allows us to estimate the efficacy of the AM system indicating that signatures detected 82.8% (20.6 million) of the malicious files. We have validated our system by conducting a longitudinal study to measure the false positive and false negative rates over a period of thirteen months.

Download to read the full chapter text

Chapter PDF

Malware Analysis

Knockin’ on Trackers’ Door: Large-Scale Automatic Analysis of Web Tracking

Reviewer Integration and Performance Measurement for Malware Detection

Keywords

References

Andrew, G., Gao, J.: Scalable training of l1-regularized log-linear models. In: Proc. of the 24th International Conference on Machine Learning (ICML), Corvalis, OR, pp. 33–40. ACM, New York (2007)
Chapter Google Scholar
Bayer, U., Habibi, I., Balzarotti, D., Kirda, E., Kruegel, C.: A view on current malware behaviors. In: Proc. of 2nd USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET), Boston, MA, USA (2009)
Google Scholar
Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: Proc. of the 16th Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA (February 2009)
Google Scholar
Bayer, U., Kruegel, C., Kirda, E.: TTAnalyze: A tool for analyzing malware. In: Proc. of 15th Annual Conference of the European Institute for Computer Antivirus Research, EICAR (2006)
Google Scholar
Bishop, C.: Pattern Recognition and Machine Learning. Springer (2006)
Google Scholar
Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: A Binary Analysis Platform. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 463–469. Springer, Heidelberg (2011)
Chapter Google Scholar
Nachenberg, C., Seshadri, V., Ramzan, Z.: An analysis of real-world effectiveness of reputation-based security. In: Proc. of Virus Bulletin Conference, VB, pp. 178–183 (2010)
Google Scholar
Chau, D.H., Nachenberg, C., Wilhelm, J., Wright, A., Faloutsos, C.: Polonium: Tera-scale graph mining and inference for malware detection. In: Proc. of SIAM International Conference on Data Mining, SDM (2011)
Google Scholar
Christodorescu, M., Jha, S., Kruegel, C.: Mining specifications of malicious behavior. In: Proc. of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pp. 5–14 (2007)
Google Scholar
Edelman, B.: Adverse selection in online “trust” certifications. In: Fifth Workshop on the Economics of Information Security, pp. 26–28 (2006)
Google Scholar
Freund, Y., Schapire, R.: Large margin classification using the perceptron algorithm. Machine Learning, 277–296 (1999)
Google Scholar
Friedman, J.: Greedy function approximation: a gradient boosting machine. Annals of Statistics, 1189–1232 (2001)
Google Scholar
Group, A.P.W.: Phishing activity trends report, 3rd quarter 2009 (2010), http://www.antiphishing.org/reports/apwg_report_Q3_2009.pdf
Haber, J.: Smartscreen application reputation in ie9 (2011), http://blogs.msdn.com/b/ie/archive/2011/05/17/smartscreen-174-application-reputation-in-ie9.aspx
Hu, W., Liao, Y., Vemuri, V.R.: Robust support vector machines for anomaly detection. In: Proc. 2003 International Conference on Machine Learning and Applications (ICMLA), pp. 23–24 (2003)
Google Scholar
Idika, N., Mathur, A.: A survey of malware detection techniques. Tech. rep., Purdue Univ. (February 2007), http://www.eecs.umich.edu/techreports/cse/2007/CSE-TR-530-07.pdf
Iseclab: Anubis, analyzing unknown binaries, http://anubis.iseclab.org
Jacob, G., Comparetti, P.M., Neugschwandtner, M., Kruegel, C., Vigna, G.: A static, packer-agnostic filter to detect similar malware samples. In: Conference on Detection of Intrusions and Malware & Vulnerability Assessment, DIMVA (2012)
Google Scholar
Jang, J., Brumley, D., Venkataraman, S.: Bitshred: feature hashing malware for scalable triage and semantic analysis. In: Proc. of the 18th ACM Conference on Computer and Communications Security (CCS), pp. 309–320 (2011)
Google Scholar
Jiang, X., Wang, X., Xu, D.: Stealthy malware detection through vmm-based ”out-of-the-box” semantic view reconstruction. In: Proc. of the ACM Conference on Computer and Communications Security (CCS), pp. 128–138 (2007)
Google Scholar
Kirda, E., Kruegel, C., Banks, G., Vigna, G., Kemmerer, R.A.: Behavior based spyware detection. In: Proc. of the 15th USENIX Security Symposium, pp. 273–288 (2006)
Google Scholar
Kolter, J., Maloof, M.: Learning to detect and classify malicious executables in the wild. Journal of Machine Learning Research (JMLR), 2721–2744 (2006)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval. Cambridge University Press (2009)
Google Scholar
Microsoft: Microsoft security intelligence report (July-December 2010) (2011), http://www.microsoft.com/security/sir/default.aspx
Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: Proc. of the 23rd Annual Computer Security Applications Conference (ACSAC), pp. 421–430 (2007)
Google Scholar
Neugschwandtner, M., Comparetti, P.M., Jacob, G., Kruegel, C.: Forecast – skimming off the malware cream. In: 27th Annual Computer Security Applications Conference, ACSAC (2011)
Google Scholar
Oberheide, J., Cooke, E., Jahanian, F.: Cloudav: N-version antivirus in the network cloud. In: Proc. of the 17th Conference on Security Symposium, pp. 91–106 (2008)
Google Scholar
Perdisci, R., Lanzi, A., Lee, W.: Mcboost: Boosting scalability in malware collection and analysis using statistical classification of executables. In: Proc. of the 2008 Annual Computer Security Applications Conference (ACSAC), pp. 301–310 (2008)
Google Scholar
Preda, M., Christodorescu, M., Jha, S., Debray, S.: A semantics-based approach to malware detection. In: Proc. of the 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 377–388 (2007)
Google Scholar
Schultz, M., Eskin, E., Zadok, E., Stolfo, S.: Data mining methods of detection of new malicious executables. In: Proc. of the 2001 IEEE Symposium on Security and Privacy (SP), pp. 38–49. IEEE Press, New York (2001)
Google Scholar
Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: Primal estimated sub-gradient solver for svm. In: Proc. of the 24th International Conference on Machine Learning (ICML), Corvalis, OR, pp. 807–814. ACM, New York (2007)
Chapter Google Scholar
Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poosankam, P., Saxena, P.: BitBlaze: A New Approach to Computer Security via Binary Analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008)
Chapter Google Scholar
Stolfo, S., Wang, K., Li, W.: Towards stealthy malware detection. In: Christodorescu, M., Jha, S., Maughan, D., Song, D., Wang, C. (eds.) Malware Detection. Springer (2007)
Google Scholar
Wicherski, G.: pehash: A novel approach to fast malware clustering. In: USENIX Workshop Large-Scale Exploits and Emergent Threats, LEET (2009)
Google Scholar
Zhang, B., Yin, J., Hao, J., Zhang, D., Wang, S.: Malicious Codes Detection Based on Ensemble Learning. In: Xiao, B., Yang, L.T., Ma, J., Muller-Schloer, C., Hua, Y. (eds.) ATC 2007. LNCS, vol. 4610, pp. 468–477. Springer, Heidelberg (2007)
Chapter Google Scholar
Zhang, J., Jin, R., Yang, Y., Hauptmann, A.G.: Modified logistic regression: An approximation to svm and its applications in large-scale text categorization. In: Proc. of the 20th International Conference on Machine Learning (ICML), Menlo Park, pp. 888–895 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Redmond, WA, 98052, USA
Jack W. Stokes, John C. Platt & Helen J. Wang
Microsoft Corp., Redmond, WA, 98052, USA
Joe Faulhaber, Jonathan Keller, Mady Marinescu, Anil Thomas & Marius Gheorghescu

Authors

Jack W. Stokes
View author publications
You can also search for this author in PubMed Google Scholar
John C. Platt
View author publications
You can also search for this author in PubMed Google Scholar
Helen J. Wang
View author publications
You can also search for this author in PubMed Google Scholar
Joe Faulhaber
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Keller
View author publications
You can also search for this author in PubMed Google Scholar
Mady Marinescu
View author publications
You can also search for this author in PubMed Google Scholar
Anil Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Marius Gheorghescu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Informatica, Università degli Studi di Milano, Via Bramante 65, 26013, Crema, Italy
Sara Foresti
Computer Science Department, Columbia University, 1214 Amsterdam Avenue, 10025, New York, NY, US
Moti Yung
Institute of Informatics and Telematics, Information Security Group, National Research Council, Pisa Research Area, Via G. Moruzzi 1, 56125, Pisa, Italy
Fabio Martinelli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stokes, J.W. et al. (2012). Scalable Telemetry Classification for Automated Malware Detection. In: Foresti, S., Yung, M., Martinelli, F. (eds) Computer Security – ESORICS 2012. ESORICS 2012. Lecture Notes in Computer Science, vol 7459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33167-1_45

Download citation

DOI: https://doi.org/10.1007/978-3-642-33167-1_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33166-4
Online ISBN: 978-3-642-33167-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Scalable Telemetry Classification for Automated Malware Detection

Abstract

Chapter PDF

Similar content being viewed by others

Malware Analysis

Knockin’ on Trackers’ Door: Large-Scale Automatic Analysis of Web Tracking

Reviewer Integration and Performance Measurement for Malware Detection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Scalable Telemetry Classification for Automated Malware Detection

Abstract

Chapter PDF

Similar content being viewed by others

Malware Analysis

Knockin’ on Trackers’ Door: Large-Scale Automatic Analysis of Web Tracking

Reviewer Integration and Performance Measurement for Malware Detection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation