Skip to main content

Full Content Search in Malware Collections

  • Conference paper
  • First Online:
Information and Operational Technology Security Systems (IOSec 2018)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11398))

Abstract

This paper aims to provide the techniques for performing fast searches by content in large malware collections. The ability to retrieve malware samples sharing a given content is important for malware researchers that look for previous instances of a new sample or test new signatures. We propose a data structure that allows fast searches and can be continuously expanded with new samples. The performance and the scalability of our solution are proved through experiments on real-world malware.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. 32-bit CRC algorithm (2018). https://msdn.microsoft.com/en-us/library/dd905031.aspx

  2. Linux programmer’s manual (2018). http://man7.org/linux/man-pages/man3/memmem.3.html

  3. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MathSciNet  Google Scholar 

  4. AV-Test: Malware statistics (2017). http://www.av-test.org/en/statistics/malware/

  5. Chen, Z., Roussopoulos, M., Liang, Z., Zhang, Y., Chen, Z., Delis, A.: Malware characteristics and threats on the internet ecosystem. J. Syst. Softw. 85(7), 1650–1672 (2012)

    Article  Google Scholar 

  6. The PostgreSQL Global Development Group: PostgreSQL (2018). https://www.postgresql.org/

  7. Jin, W., Hines, C., Cohen, C., Narasimhan, P.: A scalable search index for binary files. In: Proceedings of the 2012 7th International Conference on Malicious and Unwanted Software (MALWARE), MALWARE 2012, pp. 94–103. IEEE Computer Society, Washington, DC, USA (2012). http://dx.doi.org/10.1109/MALWARE.2012.6461014

  8. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 1st edn. Prentice Hall PTR, Upper Saddle River (2000)

    Google Scholar 

  9. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)

    Article  MathSciNet  Google Scholar 

  10. FAL Labs: Tokyocabinet (2018). http://fallabs.com/tokyocabinet/

  11. Redis Labs: Redis (2018). https://redis.io/

  12. MongoDB, Inc: MongoDB (2018). https://www.mongodb.com/

  13. Oprisa, C., Cabau, G., Colesa, A.: From plagiarism to malware detection. In: 2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, pp. 227–234, September 2013

    Google Scholar 

Download references

Acknowledgment

Research supported, in part, by EC H2020 SMESEC GA #740787 and EC H2020 CIPSEC GA #700378.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrei Mihalca .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mihalca, A., Oprişa, C. (2019). Full Content Search in Malware Collections. In: Fournaris, A., Lampropoulos, K., Marín Tordera, E. (eds) Information and Operational Technology Security Systems. IOSec 2018. Lecture Notes in Computer Science(), vol 11398. Springer, Cham. https://doi.org/10.1007/978-3-030-12085-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-12085-6_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-12084-9

  • Online ISBN: 978-3-030-12085-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics