Text Clustering for Digital Forensics Analysis

Decherchi, Sergio; Tacconi, Simone; Redi, Judith; Leoncini, Alessio; Sangiacomo, Fabio; Zunino, Rodolfo

doi:10.1007/978-3-642-04091-7_4

Sergio Decherchi⁶,
Simone Tacconi⁷,
Judith Redi⁶,
Alessio Leoncini⁶,
Fabio Sangiacomo⁶ &
…
Rodolfo Zunino⁶

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 63))

959 Accesses
16 Citations
1 Altmetric

Abstract

In the last decades digital forensics have become a prominent activity in modern investigations. Indeed, an important data source is often constituted by information contained in devices on which investigational activity is performed. Due to the complexity of this inquiring activity, the digital tools used for investigation constitute a central concern. In this paper a clustering-based text mining technique is introduced for investigational purposes. The proposed methodology is experimentally applied to the publicly available Enron dataset that well fits a plausible forensics analysis context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

U.S. Department of Justice, Electronic Crime Scene Investigation: A Guide for First Responders, I Edition, NCJ 219941 (2008), http://www.ncjrs.gov/pdffiles1/nij/219941.pdf
Chen, H., Chung, W., Xu, J.J., Wang, G., Qin, Y., Chau, M.: Crime data mining: a general framework and some examples. IEEE Trans. Computer 37, 50–56 (2004)
Google Scholar
Seifert, J.W.: Data Mining and Homeland Security: An Overview. CRS Report RL31798 (2007), www.epic.org/privacy/fusion/crs-dataminingrpt.pdf
Mena, J.: Investigative Data Mining for Security and Criminal Detection. Butterworth-Heinemann (2003)
Google Scholar
Sullivan, D.: Document warehousing and text mining. John Wiley and Sons, Chichester (2001)
Google Scholar
Fan, W., Wallace, L., Rich, S., Zhang, Z.: Tapping the power of text mining. Comm. of the ACM 49, 76–82 (2006)
Article Google Scholar
Decherchi, S., Gastaldo, P., Redi, J., Zunino, R.: Hypermetric k-means clustering for content-based document management. In: First Workshop on Computational Intelligence in Security for Information Systems, Genova (2008)
Google Scholar
The Enron Email Dataset, http://www-2.cs.cmu.edu/~enron/
Carrier, B.: File System Forensic Analysis. Addison-Wesley, Reading (2005)
Google Scholar
Popp, R., Armour, T., Senator, T., Numrych, K.: Countering terrorism through information technology. Comm. of the ACM 47, 36–43 (2004)
Article Google Scholar
Zanasi, A. (ed.): Text Mining and its Applications to Intelligence, CRM and KM, 2nd edn. WIT Press (2007)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
MATH Google Scholar
Baeza-Yates, R., Ribiero-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)
Google Scholar
Salton, G., Wong, A., Yang, L.S.: A vector space model for information retrieval. Journal Amer. Soc. Inform. Sci. 18, 613–620 (1975)
MATH Google Scholar
Linde, Y., Buzo, A., Gray, R.M.: An algorithm for vector quantizer design. IEEE Trans. Commun. COM 28, 84–95 (1980)
Article Google Scholar
Bekkerman, R., McCallum, A., Huang, G.: Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora. CIIR Technical Report IR-418 (2004), http://www.cs.umass.edu/~ronb/papers/email.pdf

Download references

Author information

Authors and Affiliations

Dept. Biophysical and Electronic Engineering, University of Genoa, 16145, Genova, Italy
Sergio Decherchi, Judith Redi, Alessio Leoncini, Fabio Sangiacomo & Rodolfo Zunino
Servizio Polizia Postale e delle Comunicazioni, Ministero dell’Interno,
Simone Tacconi

Authors

Sergio Decherchi
View author publications
You can also search for this author in PubMed Google Scholar
Simone Tacconi
View author publications
You can also search for this author in PubMed Google Scholar
Judith Redi
View author publications
You can also search for this author in PubMed Google Scholar
Alessio Leoncini
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Sangiacomo
View author publications
You can also search for this author in PubMed Google Scholar
Rodolfo Zunino
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Grupo de Inteligencia, Computacional Aplicada, Área de Lenguajes y, Sistemas Informáticos, Escuela Politécnica Superior, Universidad de Burgos , Calle Francisco de Vitoria S/N, Edifico C, 09006, Burgos, Spain
Álvaro Herrero
Dept. of Biophysical and Electronic Engineering, Genova University , Via Opera Pia 11a, 16145, Genova, Italy
Paolo Gastaldo
Dept. of Biophysical and Electronic Engineering, Genova University , Via Opera Pia 11a, 16145, Genova, Italy
Rodolfo Zunino
Grupo de Inteligencia, Computacional Aplicada, Área de Lenguajes y Sistemas Informáticos, Escuela Politénica Superior, Universidad de Burgos , Calle Francisco de Vitoria S/N, Edifico C, 09006, Burgos, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Decherchi, S., Tacconi, S., Redi, J., Leoncini, A., Sangiacomo, F., Zunino, R. (2009). Text Clustering for Digital Forensics Analysis. In: Herrero, Á., Gastaldo, P., Zunino, R., Corchado, E. (eds) Computational Intelligence in Security for Information Systems. Advances in Intelligent and Soft Computing, vol 63. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04091-7_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-04091-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04090-0
Online ISBN: 978-3-642-04091-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics