Skip to main content

Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees

  • Conference paper
  • First Online:

Abstract

Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of “known-illegal” files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the seized drive. Traditional hash matching techniques can efficiently find files that precisely match. However, these will fail in the case of merged files, embedded files, partial files, or if a file has been changed in any way.

In recent years, approximate matching algorithms have shown significant promise in the detection of files that have a high bytewise similarity. This paper focuses on MRSH-v2. A number of experiments were conducted using Hierarchical Bloom Filter Trees to dramatically reduce the quantity of pairwise comparisons that must be made between known-illegal files and files on the seized disk. The experiments demonstrate substantial speed gains over the original MRSH-v2, while maintaining effectiveness.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. van Baar, R., van Beek, H., van Eijk, E.: Digital forensics as a service: a game changer. Digit. Invest. 11(Supplement 1), S54–S62 (2014). https://doi.org/10.1016/j.diin.2014.03.007

    Article  Google Scholar 

  2. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  3. de Braekt, R.I., Le-Khac, N.A., Farina, J., Scanlon, M., Kechadi, T.: Increasing digital investigator availability through efficient workflow management and automation. In: 2016 4th International Symposium on Digital Forensic and Security (ISDFS), pp. 68–73 (2016). https://doi.org/10.1109/ISDFS.2016.7473520

  4. Breitinger, F., Baier, H.: Similarity preserving hashing: eligible properties and a new algorithm MRSH-v2. In: Rogers, M., Seigfried-Spellar, K.C. (eds.) ICDF2C 2012. LNICST, vol. 114, pp. 167–182. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39891-9_11

    Chapter  Google Scholar 

  5. Breitinger, F., Baier, H., White, D.: On the database lookup problem of approximate matching. Digit. Invest. 11, S1–S9 (2014). https://doi.org/10.1016/j.diin.2014.03.001

    Article  Google Scholar 

  6. Breitinger, F., Guttman, B., McCarrin, M., Roussev, V., White, D.: Approximate matching: definition and terminology. NIST Spec. Publ. 800, 168 (2014)

    Google Scholar 

  7. Breitinger, F., Rathgeb, C., Baier, H.: An efficient similarity digests database lookup - a logarithmic divide & conquer approach. J. Digit. Forensics Secur. Law 9(2), 155–166 (2014)

    Google Scholar 

  8. Broder, A.Z.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences 1997, Proceedings, pp. 21–29. IEEE (1997). https://doi.org/10.1109/SEQUEN.1997.666900

  9. Casey, E., Ferraro, M., Nguyen, L.: Investigation delayed is justice denied: proposals for expediting forensic examinations of digital evidence. J. Forensic Sci. 54(6), 1353–1364 (2009)

    Article  Google Scholar 

  10. Gupta, J.N., Kalaimannan, E., Yoo, S.M.: A heuristic for maximizing investigation effectiveness of digital forensic cases involving multiple investigators. Comput. Oper. Res. 69, 1–9 (2016). https://doi.org/10.1016/j.cor.2015.11.003

    Article  MathSciNet  MATH  Google Scholar 

  11. Harichandran, V.S., Breitinger, F., Baggili, I.: Bytewise approximate matching: the good, the bad, and the unknown. J. Digit. Forensics Secur. Law: JDFSL 11(2), 59 (2016)

    Google Scholar 

  12. James, J.I., Gladyshev, P.: Automated inference of past action instances in digital investigations. Int. J. Inf. Secur. 14(3), 249–261 (2015). https://doi.org/10.1007/s10207-014-0249-6

    Article  Google Scholar 

  13. Kornblum, J.: Identifying identical files using context triggered piecewise hashing. Digit. Invest. 3, 91–97 (2006). https://doi.org/10.1016/j.diin.2006.06.015

    Article  Google Scholar 

  14. Lillis, D., Becker, B., O’Sullivan, T., Scanlon, M.: Current challenges and future research areas for digital forensic investigation. In: 11th ADFSL Conference on Digital Forensics, Security and Law (CDFSL 2016), ADFSL, Daytona Beach, FL, USA (2016). https://doi.org/10.13140/RG.2.2.34898.76489

  15. Oliver, J., Cheng, C., Chen, Y.: TLSH-a locality sensitive hash. In: Cybercrime and Trustworthy Computing Workshop (CTC), 2013 Fourth, pp. 7–13. IEEE (2013). https://doi.org/10.1109/CTC.2013.9

  16. Quick, D., Choo, K.K.R.: Impacts of increasing volume of digital forensic data: a survey and future research challenges. Digit. Invest. 11(4), 273–294 (2014). https://doi.org/10.1016/j.diin.2014.09.002

    Article  Google Scholar 

  17. Rogers, M.K., Goldman, J., Mislan, R., Wedge, T., Debrota, S.: Computer forensics field triage process model. J. Digit. Forensics Secur. Law 1(2), 19–38 (2006)

    Google Scholar 

  18. Roussev, V.: Data fingerprinting with similarity digests. In: Chow, K.P., Shenoi, S. (eds.) IFIP International Conference on Digital Forensics. IFIP AICT, vol. 337, pp. 207–226. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15506-2_15

    Google Scholar 

  19. Roussev, V.: An evaluation of forensic similarity hashes. Digit. Invest. 8, S34–S41 (2011)

    Article  Google Scholar 

  20. Roussev, V., Richard III, G.G.: Breaking the performance wall: the case for distributed digital forensics. In: Proceedings of the 2004 Digital Forensics Research Workshop, vol. 94 (2004)

    Google Scholar 

  21. Sadowski, C., Levin, G.: Simhash: hash-based similarity detection. Technical report, Google (2007)

    Google Scholar 

  22. Scanlon, M.: Battling the digital forensic backlog through data deduplication. In: Proceedings of the 6th IEEE International Conference on Innovative Computing Technologies (INTECH 2016). IEEE, Dublin (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Lillis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lillis, D., Breitinger, F., Scanlon, M. (2018). Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees. In: Matoušek, P., Schmiedecker, M. (eds) Digital Forensics and Cyber Crime. ICDF2C 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 216. Springer, Cham. https://doi.org/10.1007/978-3-319-73697-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73697-6_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73696-9

  • Online ISBN: 978-3-319-73697-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics