Skip to main content

A New Approach to Compressed File Fragment Identification

  • Conference paper
  • First Online:
International Joint Conference (CISIS 2015)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 369))

Abstract

Identifying the underlying type of a file given only a file fragment is a big challenge in digital forensics. Many methods have been applied to file type identification; however the identification accuracies of most of file types are still very low, especially for files having complex structures because their contents are compound data built from different data types. In this paper, we propose a new approach based on the deflate-encoded data detection, entropy-based clustering, and the use of machine learning techniques to identify deflate-encoded file fragments. Experiments on the popular compound file type showed high identification accuracy for the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Roussev, V., Quates, C.: File fragment encoding classification—an empirical approach. Digit. Investig. 10(Supplement), S69–S77 (2013)

    Article  Google Scholar 

  2. Li, Q., Ong, A., Suganthan, P., Thing, V.: A novel support vector machine approach to high entropy data fragment classification. In: Proceedings of the South African Information Security Multi-conference (SAISMC 2010), 2010

    Google Scholar 

  3. Penrose, P., Macfarlane, R., Buchanan, W.J.: Approaches to the classification of high entropy file fragments. Digit. Investig. 10, 372–384 (2013)

    Article  Google Scholar 

  4. Roussev, V., Garfinkel, S.L.: File fragment classification-the case for specialized approaches. In: Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering, 2009 (SADFE ‘09), pp. 3–14

    Google Scholar 

  5. Rentz, D.: OpenOffice. org’s documentation of the microsoft compound document. The Spreadsheet Project, OpenOffice.org. http://sc.openoffice.org/compdocfileformat.pdf (2007)

  6. Park, B., Park, J., Lee, S.: Data concealment and detection in Microsoft Office 2007 files. Digit. Investig. 5, 104–114 (2009)

    Article  Google Scholar 

  7. Meehan, J., Rose, T.S.C.C.: PDF reference. Adobe Portable Doc. Format Vers. 1, 1 (2001)

    Google Scholar 

  8. Axelsson, S.: The Normalised Compression Distance as a file fragment classifier. Digit. Investig. 7(Supplement), S24–S31 (2010)

    Article  Google Scholar 

  9. Fitzgerald, S., Mathews, G., Morris, C., Zhulyn, O.: Using NLP techniques for file fragment classification. Digit. Investig. 9(Supplement), S44–S49 (2012)

    Article  Google Scholar 

  10. Wei-Jen, L., Ke, W., Stolfo, S.J., Herzog, B.: Fileprints: identifying file types by n-gram analysis. In: Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop, 2005 (IAW ‘05), pp. 64–71

    Google Scholar 

  11. Sportiello, L., Zanero, S.: File block classification by support vector machine. In: 2011 Sixth International Conference on Availability, Reliability and Security (ARES), 2011, pp. 307–312

    Google Scholar 

  12. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2, 27 (2011)

    Google Scholar 

  13. Deutsch, L.P.: DEFLATE Compressed Data Format Specification Version 1.3 (1996)

    Google Scholar 

  14. Park, B., Savoldi, A., Gubian, P., Park, J., Lee, S.H., Lee, S.: Data extraction from damage compressed file for computer forensic purposes. Int. J. Hybrid Inf. Technol. 1, 89–102 (2008)

    Google Scholar 

  15. Khoa, N., Dat, T., Wanli, M., Sharma, D.: An approach to detect network attacks applied for network forensics. In: 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2014, pp. 655–660

    Google Scholar 

  16. Rice, F.: Introducing the office (2007) open XML file formats. Microsoft Developer Network (2006)

    Google Scholar 

  17. Boutell, T.: PNG (Portable Network Graphics) Specification Version 1.0 (1997)

    Google Scholar 

  18. Calhoun, W.C., Coles, D.: Predicting the types of file fragments. Digit. Investig. 5(Supplement), S14–S20 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khoa Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, K., Tran, D., Ma, W., Sharma, D. (2015). A New Approach to Compressed File Fragment Identification. In: Herrero, Á., Baruque, B., Sedano, J., Quintián, H., Corchado, E. (eds) International Joint Conference. CISIS 2015. Advances in Intelligent Systems and Computing, vol 369. Springer, Cham. https://doi.org/10.1007/978-3-319-19713-5_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-19713-5_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-19712-8

  • Online ISBN: 978-3-319-19713-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics