Abstract
Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of “known-illegal” files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the seized drive. Traditional hash matching techniques can efficiently find files that precisely match. However, these will fail in the case of merged files, embedded files, partial files, or if a file has been changed in any way.
In recent years, approximate matching algorithms have shown significant promise in the detection of files that have a high bytewise similarity. This paper focuses on MRSH-v2. A number of experiments were conducted using Hierarchical Bloom Filter Trees to dramatically reduce the quantity of pairwise comparisons that must be made between known-illegal files and files on the seized disk. The experiments demonstrate substantial speed gains over the original MRSH-v2, while maintaining effectiveness.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
van Baar, R., van Beek, H., van Eijk, E.: Digital forensics as a service: a game changer. Digit. Invest. 11(Supplement 1), S54–S62 (2014). https://doi.org/10.1016/j.diin.2014.03.007
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
de Braekt, R.I., Le-Khac, N.A., Farina, J., Scanlon, M., Kechadi, T.: Increasing digital investigator availability through efficient workflow management and automation. In: 2016 4th International Symposium on Digital Forensic and Security (ISDFS), pp. 68–73 (2016). https://doi.org/10.1109/ISDFS.2016.7473520
Breitinger, F., Baier, H.: Similarity preserving hashing: eligible properties and a new algorithm MRSH-v2. In: Rogers, M., Seigfried-Spellar, K.C. (eds.) ICDF2C 2012. LNICST, vol. 114, pp. 167–182. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39891-9_11
Breitinger, F., Baier, H., White, D.: On the database lookup problem of approximate matching. Digit. Invest. 11, S1–S9 (2014). https://doi.org/10.1016/j.diin.2014.03.001
Breitinger, F., Guttman, B., McCarrin, M., Roussev, V., White, D.: Approximate matching: definition and terminology. NIST Spec. Publ. 800, 168 (2014)
Breitinger, F., Rathgeb, C., Baier, H.: An efficient similarity digests database lookup - a logarithmic divide & conquer approach. J. Digit. Forensics Secur. Law 9(2), 155–166 (2014)
Broder, A.Z.: On the resemblance and containment of documents. In: Compression and Complexity of Sequences 1997, Proceedings, pp. 21–29. IEEE (1997). https://doi.org/10.1109/SEQUEN.1997.666900
Casey, E., Ferraro, M., Nguyen, L.: Investigation delayed is justice denied: proposals for expediting forensic examinations of digital evidence. J. Forensic Sci. 54(6), 1353–1364 (2009)
Gupta, J.N., Kalaimannan, E., Yoo, S.M.: A heuristic for maximizing investigation effectiveness of digital forensic cases involving multiple investigators. Comput. Oper. Res. 69, 1–9 (2016). https://doi.org/10.1016/j.cor.2015.11.003
Harichandran, V.S., Breitinger, F., Baggili, I.: Bytewise approximate matching: the good, the bad, and the unknown. J. Digit. Forensics Secur. Law: JDFSL 11(2), 59 (2016)
James, J.I., Gladyshev, P.: Automated inference of past action instances in digital investigations. Int. J. Inf. Secur. 14(3), 249–261 (2015). https://doi.org/10.1007/s10207-014-0249-6
Kornblum, J.: Identifying identical files using context triggered piecewise hashing. Digit. Invest. 3, 91–97 (2006). https://doi.org/10.1016/j.diin.2006.06.015
Lillis, D., Becker, B., O’Sullivan, T., Scanlon, M.: Current challenges and future research areas for digital forensic investigation. In: 11th ADFSL Conference on Digital Forensics, Security and Law (CDFSL 2016), ADFSL, Daytona Beach, FL, USA (2016). https://doi.org/10.13140/RG.2.2.34898.76489
Oliver, J., Cheng, C., Chen, Y.: TLSH-a locality sensitive hash. In: Cybercrime and Trustworthy Computing Workshop (CTC), 2013 Fourth, pp. 7–13. IEEE (2013). https://doi.org/10.1109/CTC.2013.9
Quick, D., Choo, K.K.R.: Impacts of increasing volume of digital forensic data: a survey and future research challenges. Digit. Invest. 11(4), 273–294 (2014). https://doi.org/10.1016/j.diin.2014.09.002
Rogers, M.K., Goldman, J., Mislan, R., Wedge, T., Debrota, S.: Computer forensics field triage process model. J. Digit. Forensics Secur. Law 1(2), 19–38 (2006)
Roussev, V.: Data fingerprinting with similarity digests. In: Chow, K.P., Shenoi, S. (eds.) IFIP International Conference on Digital Forensics. IFIP AICT, vol. 337, pp. 207–226. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15506-2_15
Roussev, V.: An evaluation of forensic similarity hashes. Digit. Invest. 8, S34–S41 (2011)
Roussev, V., Richard III, G.G.: Breaking the performance wall: the case for distributed digital forensics. In: Proceedings of the 2004 Digital Forensics Research Workshop, vol. 94 (2004)
Sadowski, C., Levin, G.: Simhash: hash-based similarity detection. Technical report, Google (2007)
Scanlon, M.: Battling the digital forensic backlog through data deduplication. In: Proceedings of the 6th IEEE International Conference on Innovative Computing Technologies (INTECH 2016). IEEE, Dublin (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Lillis, D., Breitinger, F., Scanlon, M. (2018). Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees. In: Matoušek, P., Schmiedecker, M. (eds) Digital Forensics and Cyber Crime. ICDF2C 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 216. Springer, Cham. https://doi.org/10.1007/978-3-319-73697-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-73697-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73696-9
Online ISBN: 978-3-319-73697-6
eBook Packages: Computer ScienceComputer Science (R0)