Abstract
This paper aims to provide the techniques for performing fast searches by content in large malware collections. The ability to retrieve malware samples sharing a given content is important for malware researchers that look for previous instances of a new sample or test new signatures. We propose a data structure that allows fast searches and can be continuously expanded with new samples. The performance and the scalability of our solution are proved through experiments on real-world malware.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
32-bit CRC algorithm (2018). https://msdn.microsoft.com/en-us/library/dd905031.aspx
Linux programmer’s manual (2018). http://man7.org/linux/man-pages/man3/memmem.3.html
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
AV-Test: Malware statistics (2017). http://www.av-test.org/en/statistics/malware/
Chen, Z., Roussopoulos, M., Liang, Z., Zhang, Y., Chen, Z., Delis, A.: Malware characteristics and threats on the internet ecosystem. J. Syst. Softw. 85(7), 1650–1672 (2012)
The PostgreSQL Global Development Group: PostgreSQL (2018). https://www.postgresql.org/
Jin, W., Hines, C., Cohen, C., Narasimhan, P.: A scalable search index for binary files. In: Proceedings of the 2012 7th International Conference on Malicious and Unwanted Software (MALWARE), MALWARE 2012, pp. 94–103. IEEE Computer Society, Washington, DC, USA (2012). http://dx.doi.org/10.1109/MALWARE.2012.6461014
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 1st edn. Prentice Hall PTR, Upper Saddle River (2000)
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)
FAL Labs: Tokyocabinet (2018). http://fallabs.com/tokyocabinet/
Redis Labs: Redis (2018). https://redis.io/
MongoDB, Inc: MongoDB (2018). https://www.mongodb.com/
Oprisa, C., Cabau, G., Colesa, A.: From plagiarism to malware detection. In: 2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 227–234, September 2013
Acknowledgment
Research supported, in part, by EC H2020 SMESEC GA #740787 and EC H2020 CIPSEC GA #700378.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Mihalca, A., Oprişa, C. (2019). Full Content Search in Malware Collections. In: Fournaris, A., Lampropoulos, K., Marín Tordera, E. (eds) Information and Operational Technology Security Systems. IOSec 2018. Lecture Notes in Computer Science(), vol 11398. Springer, Cham. https://doi.org/10.1007/978-3-030-12085-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-12085-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12084-9
Online ISBN: 978-3-030-12085-6
eBook Packages: Computer ScienceComputer Science (R0)