Performance Evaluation of a Natural Language Processing Approach Applied in White Collar Crime Investigation

van Banerveld, Maarten; Le-Khac, Nhien-An; Kechadi, M-Tahar

doi:10.1007/978-3-319-12778-1_3

Maarten van Banerveld¹⁹,
Nhien-An Le-Khac²⁰ &
M-Tahar Kechadi²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8860))

Included in the following conference series:

International Conference on Future Data and Security Engineering

1299 Accesses
7 Citations

Abstract

In today’s world we are confronted with increasing amounts of information every day coming from a large variety of sources. People and corporations are producing data on a large scale, and since the rise of the internet, e-mail and social media the amount of produced data has grown exponentially. From a law enforcement perspective we have to deal with these huge amounts of data when a criminal investigation is launched against an individual or company. Relevant questions need to be answered like who committed the crime, who were involved, what happened and on what time, who were communicating and about what? Not only the amount of available data to investigate has increased enormously, but also the complexity of this data has increased. When these communication patterns need to be combined with for instance a seized financial administration or corporate document shares a complex investigation problem arises. Recently, criminal investigators face a huge challenge when evidence of a crime needs to be found in the Big Data environment where they have to deal with large and complex datasets especially in financial and fraud investigations. To tackle this problem, a financial and fraud investigation unit of a European country has developed a new tool named LES that uses Natural Language Processing (NLP) techniques to help criminal investigators handle large amounts of textual information in a more efficient and faster way. In this paper, we present briefly this tool and we focus on the evaluation its performance in terms of the requirements of forensic investigation: speed, smarter and easier for investigators. In order to evaluate this LES tool, we use different performance metrics. We also show experimental results of our evaluation with large and complex datasets from real-world application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Liddy Elizabeth, D.: Natural Language Processing, 2nd edn. Marcel Decker, Inc., NY (2001); In Encyclopedia of Library and Information Science
Google Scholar
Tjong, K.S., Erik, F.: Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In: Proc. Conference on Natual Language Learning, Edmonton, Canada (June 2003)
Google Scholar
Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butteworths (1979)
Google Scholar
Lais S.: Quick Study: Optical Character Recognition. Computer World (June 25, 2014), http://www.computerworld.com/s/article/73023/Optical_Character_Recognition
(June 25, 2014), http://www.systransoft.com/systran/corporate-profile/translation-technology/what-is-machine-translation/
Jurafsky, D., Martin James, H.: Speech and Language Processing - An Introduction to Natural Language Processing, 2nd edn. Pearson Prentice Hall, Stanford University (2009)
Google Scholar
Fromkin, V., Rodman, R., Hyam, N.: An Introduction to language, 9th edn. Wadsworth (2011)
Google Scholar
Rafferty, A.N., de Marneffe, M.-C., Manning, C.D.: Finding Contradictions in Text. In: ACL 2008 (2008), http://nlp.stanford.edu/pubs/contradiction-acl08.pdf (June 25, 2014)
Sokol, L., Ames, R.: Analytics in a Big Data Environment. IBM Redbooks (2012)
Google Scholar
Innis Tasha, R., et al.: Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology Acquisition. In: TMBIO 2006 Proceedings of the 1st International Workshop on Text Mining in Bioinformatics, pp. 7–14 (2006)
Google Scholar
Fitzgerald, S., et al.: Using NLP techniques for file fragment classification. Digital Investigation (9) (2012)
Google Scholar
Scholkopf, B.: A short tutorial on kernels. Microsoft Research, Tech Rep: MSR-TR-200-6t (2000)
Google Scholar
O’Day, D.R., Calix, R.A.: Text Message Corpus: Applying Natural Language Processing to Mobile Device Forensics. In: IEEE International Conference on Multimedia and Expo, San Jose, USA, July 15 - 19 (2013)
Google Scholar
Van Dijk, D., Henseler, H.: Semantic Search in E-Discovery: An Interdisciplinary Approach. In: Workshop on Standards for Using Predictive Coding, Machine Learning, and Other Advanced Search and Review Methods in E-Discovery, ICAIL 2013 (2013)
Google Scholar
(June 25, 2014), http://hadoop.apache.org
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: OSDI 2004: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA (December 2004)
Google Scholar
Popowitch, F.: Using text mining and natural language processing for health care claims processing. ACM SIGKDD Explorations Newsletter - Natural Language Processing and Text Mining 7(1), 59–66 (2005)
Article Google Scholar
Meier, J.D., et al.: Microsoft Performance Testing Guidance for Web Applications. Redmond (2007), http://msdn.microsoft.com/en-us/library/bb924375.aspx
Buist, A.H., Kraaij, W., Raaijmakers, S.: Automatic Summarization of Meeting Data: A Feasibility Study. In: Proceedings of the 15th CLIN Conference (2005)
Google Scholar
(June 25, 2014), http://www.accessdata.com/solutions/digital-forensics/ftk

Download references

Author information

Authors and Affiliations

Surinameweg 4, 2035 VA, Haarlem, The Netherlands
Maarten van Banerveld
School of Computer Science & Informatics, University College Dublin, Belfield, Dublin 4, Ireland
Nhien-An Le-Khac & M-Tahar Kechadi

Authors

Maarten van Banerveld
View author publications
You can also search for this author in PubMed Google Scholar
Nhien-An Le-Khac
View author publications
You can also search for this author in PubMed Google Scholar
M-Tahar Kechadi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Ho Chi Minh City University of Technology, 268 Ly Thuong Kiet Street, District 10, Ho Chi Minh City, Vietnam
Tran Khanh Dang & Nam Thoai &
Johannes Kepler University Linz, Altenberger Straße 69, 4040, Linz, Austria
Roland Wagner & Josef Küng &
University of Vienna, Währinger Straße 29, 1190, Wien, Austria
Erich Neuhold
Hosei University, 3-7-2, Kajino-machi, 184-8584, Koganei-shi, Tokyo, Japan
Makoto Takizawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Banerveld, M., Le-Khac, NA., Kechadi, MT. (2014). Performance Evaluation of a Natural Language Processing Approach Applied in White Collar Crime Investigation. In: Dang, T.K., Wagner, R., Neuhold, E., Takizawa, M., Küng, J., Thoai, N. (eds) Future Data and Security Engineering. FDSE 2014. Lecture Notes in Computer Science, vol 8860. Springer, Cham. https://doi.org/10.1007/978-3-319-12778-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-12778-1_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12777-4
Online ISBN: 978-3-319-12778-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics