Malrec: Compact Full-Trace Malware Recording for Retrospective Deep Analysis

Severi, Giorgio; Leek, Tim; Dolan-Gavitt, Brendan

doi:10.1007/978-3-319-93411-2_1

Giorgio Severi¹⁶,
Tim Leek¹⁷ &
Brendan Dolan-Gavitt¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10885))

Included in the following conference series:

International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment

2030 Accesses
15 Citations

Abstract

Malware sandbox systems have become a critical part of the Internet’s defensive infrastructure. These systems allow malware researchers to quickly understand a sample’s behavior and effect on a system. However, current systems face two limitations: first, for performance reasons, the amount of data they can collect is limited (typically to system call traces and memory snapshots). Second, they lack the ability to perform retrospective analysis—that is, to later extract features of the malware’s execution that were not considered relevant when the sample was originally executed. In this paper, we introduce a new malware sandbox system, Malrec, which uses whole-system deterministic record and replay to capture high-fidelity, whole-system traces of malware executions with low time and space overheads. We demonstrate the usefulness of this system by presenting a new dataset of 66,301 malware recordings collected over a two-year period, along with two preliminary analyses that would not be possible without full traces: an analysis of kernel mode malware and exploits, and a fine-grained malware family classification based on textual memory access contents. The Malrec system and dataset can help provide a standardized benchmark for evaluating the performance of future dynamic analyses.

Tim Leek’s work is sponsored by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract #FA8721-05-C-0002 and/or #FA8702-15-D-0001. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://malwr.com/.
2.
Note that this calculation does not include the cost of storing the recordings on some EC2-accessible storage medium such as Elastic Block Store.
3.
https://www.virustotal.com.
4.
https://www.elastic.co/products/elasticsearch.
5.
Virut is a malware botnet that spreads through html and executable files infection.
6.
https://cuckoosandbox.org/.
7.
https://malwr.com/.

References

Volatility command reference - GUI. https://github.com/volatilityfoundation/volatility/wiki/Command-Reference-Gui
Abadi, M.N., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mane, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viegas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv: 1603.04467 [cs], March 2016
Abadi, M., Budiu, M., Erlingsson, U., Ligatti, J.: Control-flow integrity. In: ACM Conference on Computer and Communications Security (2005)
Google Scholar
Balzarotti, D., Cova, M., Karlberger, C., Kirda, E., Kruegel, C., Vigna, G.: Efficient detection of split personalities in malware. In: NDSS (2010)
Google Scholar
Barak, B., Goldreich, O., Impagliazzo, R., Rudich, S., Sahai, A., Vadhan, S., Yang, K.: On the (im)possibility of obfuscating programs. In: Kilian, J. (ed.) CRYPTO 2001. LNCS, vol. 2139, pp. 1–18. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44647-8_1
Chapter Google Scholar
Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: NDSS, vol. 9, pp. 8–11. Citeseer (2009)
Google Scholar
Bruening, D., Garnett, T., Amarasinghe, S.: An infrastructure for adaptive dynamic optimization. In: International Symposium on Code Generation and Optimization, CGO 2003, pp. 265–275. IEEE (2003)
Google Scholar
Chen, Y., Zhang, S., Guo, Q., Li, L., Wu, R., Chen, T.: Deterministic replay: a survey. ACM Comput. Surv. 48(2), 1–47 (2015)
Article Google Scholar
Chow, J., Garfinkel, T., Chen, P.M.: Decoupling dynamic program analysis from execution in virtual environments. In: USENIX 2008 Annual Technical Conference on Annual Technical Conference, pp. 1–14 (2008)
Google Scholar
Dinaburg, A., Royal, P., Sharif, M., Lee, W.: Ether: malware analysis via hardware virtualization extensions. In: Proceedings of the 15th ACM Conference on Computer and Communications Security, pp. 51–62. ACM (2008)
Google Scholar
Dolan-Gavitt, B., Hodosh, J., Hulin, P., Leek, T., Whelan, R.: Repeatable reverse engineering with PANDA. In: Program Protection and Reverse Engineering Workshop (PPREW), pp. 1–11. ACM Press (2015)
Google Scholar
Dolan-Gavitt, B., Leek, T., Hodosh, J., Lee, W.: Tappan zee (north) bridge: mining memory accesses for introspection. In: ACM Conference on Computer and Communications Security (CCS), pp. 839–850. ACM Press (2013)
Google Scholar
Dunlap, G.W., King, S.T., Cinar, S., Basrai, M.A., Chen, P.M.: ReVirt: enabling intrusion analysis through virtual-machine logging and replay. SIGOPS Oper. Syst. Rev. 36(SI), 211–224 (2002)
Article Google Scholar
Iyyer, M., Manjunatha, V., Boyd-Graber, J., Daumé III, H.: Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (vol. 1: Long Papers), pp. 1681–1691 (2015)
Google Scholar
Kantchelian, A., Tschantz, M.C., Afroz, S., Miller, B., Shankar, V., Bachwani, R., Joseph, A.D., Tygar, J.D.: Better malware ground truth: techniques for weighting anti-virus vendor labels. In: ACM Workshop on Artificial Intelligence and Security (AISEC), pp. 45–56. ACM Press (2015)
Google Scholar
King, S.T., Chen, P.M.: Backtracking intrusions. ACM Trans. Comput. Syst. (TOCS) 23(1), 51–76 (2005)
Article Google Scholar
King, S.T., Dunlap, G.W., Chen, P.M.: Debugging operating systems with time-traveling virtual machines. In: Proceedings of the Annual Conference on USENIX Annual Technical Conference, p. 1 (2005)
Google Scholar
LeBlanc, T.J., Mellor-Crummey, J.M.: Debugging parallel programs with instant replay. IEEE Trans. Comput. 36(4), 471–482 (1987)
Article Google Scholar
Li, P., Liu, L., Gao, D., Reiter, M.K.: On challenges in evaluating malware clustering. In: Jha, S., Sommer, R., Kreibich, C. (eds.) RAID 2010. LNCS, vol. 6307, pp. 238–255. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15512-3_13
Chapter Google Scholar
Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Notices, vol. 40, pp. 190–200. ACM (2005)
Article Google Scholar
Mandl, T., Bayer, U., Nentwich, F.: ANUBIS ANalyzing unknown BInarieS the automatic way. In: Virus Bulletin Conference, vol. 1, p. 2 (2009)
Google Scholar
Mohaisen, A., Alrawi, O.: AV-meter: an evaluation of antivirus scans and labels. In: Dietrich, S. (ed.) DIMVA 2014. LNCS, vol. 8550, pp. 112–131. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08509-8_7
Chapter Google Scholar
Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: Twenty-Third Annual Computer Security Applications Conference, ACSAC 2007, pp. 421–430. IEEE (2007)
Google Scholar
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: ACM SIGPLAN Notices, vol. 42, pp. 89–100. ACM (2007)
Article Google Scholar
Newsome, J.: Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In: Network and Distributed System Security Symposium (NDSS) (2005)
Google Scholar
Prakash, A., Yin, H., Liang, Z.: Enforcing system-wide control flow integrity for exploit detection and diagnosis. In: ACM SIGSAC Symposium on Information, Computer and Communications Security (2013)
Google Scholar
Quynh, N.A.: Capstone: Next-Gen Disassembly Framework. Black Hat USA (2014)
Google Scholar
Sebastián, M., Rivera, R., Kotzias, P., Caballero, J.: AVclass: a tool for massive malware labeling. In: Monrose, F., Dacier, M., Blanc, G., Garcia-Alfaro, J. (eds.) RAID 2016. LNCS, vol. 9854, pp. 230–253. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45719-2_11
Chapter Google Scholar
Song, D., et al.: BitBlaze: a new approach to computer security via binary analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89862-7_1
Chapter Google Scholar
Tian, K., Yao, D., Ryder, B.G., Tan, G.: Analysis of code heterogeneity for high-precision classification of repackaged malware. In: 2016 IEEE Security and Privacy Workshops (SPW), pp. 262–271, May 2016
Google Scholar
Upchurch, J., Zhou, X.: Malware provenance: code reuse detection in malicious software at scale. In: 2016 11th International Conference on Malicious and Unwanted Software (MALWARE), pp. 1–9, October 2016
Google Scholar
VMWare: Enhanced Execution Record/Replay in Workstation 6.5, April 2008
Google Scholar
Walters, A.: The Volatility framework: Volatile memory artifact extraction utility framework. https://www.volatilesystems.com/default/volatility
Willems, C., Holz, T., Freiling, F.: Toward automated dynamic malware analysis using CWSandbox. IEEE Secur. Priv. 5(2), 32–39 (2007)
Article Google Scholar
Yan, L.K., Jayachandra, M., Zhang, M., Yin, H.: V2E: combining hardware virtualization and software emulation for transparent and extensible malware analysis. ACM SIGPLAN Not. 47(7), 227–238 (2012)
Article Google Scholar
Zhou, Y., Jiang, X.: Dissecting android malware: characterization and evolution. In: 2012 IEEE Symposium on Security and Privacy, pp. 95–109, May 2012
Google Scholar

Download references

Acknowledgments

We would like to thank our anonymous reviewers for their helpful feedback, as well as Paul Royal and the Georgia Tech Institute for Information Security and Privacy for their help in obtaining malware samples for Malrec. Funding for this research was provided under NSF Award #1657199.

Author information

Authors and Affiliations

Sapienza University of Rome, Rome, Italy
Giorgio Severi
MIT Lincoln Laboratory, Lexington, USA
Tim Leek
New York University, New York, USA
Brendan Dolan-Gavitt

Authors

Giorgio Severi
View author publications
You can also search for this author in PubMed Google Scholar
Tim Leek
View author publications
You can also search for this author in PubMed Google Scholar
Brendan Dolan-Gavitt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giorgio Severi .

Editor information

Editors and Affiliations

Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Cristiano Giuffrida
CEA, Palaiseau, France
Sébastien Bardin
Université Paris-Saclay, Evry, France
Gregory Blanc

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Severi, G., Leek, T., Dolan-Gavitt, B. (2018). Malrec: Compact Full-Trace Malware Recording for Retrospective Deep Analysis. In: Giuffrida, C., Bardin, S., Blanc, G. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2018. Lecture Notes in Computer Science(), vol 10885. Springer, Cham. https://doi.org/10.1007/978-3-319-93411-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-93411-2_1
Published: 08 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93410-5
Online ISBN: 978-3-319-93411-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics