Abstract
Identifying the underlying type of a file given only a file fragment is a big challenge in digital forensics. Many methods have been applied to file type identification; however the identification accuracies of most of file types are still very low, especially for files having complex structures because their contents are compound data built from different data types. In this paper, we propose a new approach based on the deflate-encoded data detection, entropy-based clustering, and the use of machine learning techniques to identify deflate-encoded file fragments. Experiments on the popular compound file type showed high identification accuracy for the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Roussev, V., Quates, C.: File fragment encoding classification—an empirical approach. Digit. Investig. 10(Supplement), S69–S77 (2013)
Li, Q., Ong, A., Suganthan, P., Thing, V.: A novel support vector machine approach to high entropy data fragment classification. In: Proceedings of the South African Information Security Multi-conference (SAISMC 2010), 2010
Penrose, P., Macfarlane, R., Buchanan, W.J.: Approaches to the classification of high entropy file fragments. Digit. Investig. 10, 372–384 (2013)
Roussev, V., Garfinkel, S.L.: File fragment classification-the case for specialized approaches. In: Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering, 2009 (SADFE ‘09), pp. 3–14
Rentz, D.: OpenOffice. org’s documentation of the microsoft compound document. The Spreadsheet Project, OpenOffice.org. http://sc.openoffice.org/compdocfileformat.pdf (2007)
Park, B., Park, J., Lee, S.: Data concealment and detection in Microsoft Office 2007 files. Digit. Investig. 5, 104–114 (2009)
Meehan, J., Rose, T.S.C.C.: PDF reference. Adobe Portable Doc. Format Vers. 1, 1 (2001)
Axelsson, S.: The Normalised Compression Distance as a file fragment classifier. Digit. Investig. 7(Supplement), S24–S31 (2010)
Fitzgerald, S., Mathews, G., Morris, C., Zhulyn, O.: Using NLP techniques for file fragment classification. Digit. Investig. 9(Supplement), S44–S49 (2012)
Wei-Jen, L., Ke, W., Stolfo, S.J., Herzog, B.: Fileprints: identifying file types by n-gram analysis. In: Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop, 2005 (IAW ‘05), pp. 64–71
Sportiello, L., Zanero, S.: File block classification by support vector machine. In: 2011 Sixth International Conference on Availability, Reliability and Security (ARES), 2011, pp. 307–312
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2, 27 (2011)
Deutsch, L.P.: DEFLATE Compressed Data Format Specification Version 1.3 (1996)
Park, B., Savoldi, A., Gubian, P., Park, J., Lee, S.H., Lee, S.: Data extraction from damage compressed file for computer forensic purposes. Int. J. Hybrid Inf. Technol. 1, 89–102 (2008)
Khoa, N., Dat, T., Wanli, M., Sharma, D.: An approach to detect network attacks applied for network forensics. In: 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2014, pp. 655–660
Rice, F.: Introducing the office (2007) open XML file formats. Microsoft Developer Network (2006)
Boutell, T.: PNG (Portable Network Graphics) Specification Version 1.0 (1997)
Calhoun, W.C., Coles, D.: Predicting the types of file fragments. Digit. Investig. 5(Supplement), S14–S20 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nguyen, K., Tran, D., Ma, W., Sharma, D. (2015). A New Approach to Compressed File Fragment Identification. In: Herrero, Á., Baruque, B., Sedano, J., Quintián, H., Corchado, E. (eds) International Joint Conference. CISIS 2015. Advances in Intelligent Systems and Computing, vol 369. Springer, Cham. https://doi.org/10.1007/978-3-319-19713-5_32
Download citation
DOI: https://doi.org/10.1007/978-3-319-19713-5_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19712-8
Online ISBN: 978-3-319-19713-5
eBook Packages: EngineeringEngineering (R0)