A framework for crime data analysis using relationship among named entities

Das, Priyanka; Das, Asit Kumar; Nayak, Janmenjoy; Pelusi, Danilo

doi:10.1007/s00521-019-04150-8

A framework for crime data analysis using relationship among named entities

Soft Computing Techniques: Applications and Challenges
Published: 22 March 2019

Volume 32, pages 7671–7689, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Priyanka Das ORCID: orcid.org/0000-0002-4729-7652¹,
Asit Kumar Das¹,
Janmenjoy Nayak² &
…
Danilo Pelusi³

549 Accesses
5 Citations
Explore all metrics

Abstract

Many crime reports are available online in various blogs and Newswire. Though manual annotation of these massive reports is quite tedious for crime data analysis, it gives an overall crime scenario of all over the world. This motivates us to propose a framework for crime data analysis based on the online reports. Initially, the method extracts the crime reports and identifies named entities. The intermediate sequence of context words between every consecutive pair of named entities is termed as a crime vector that provides relationships between the entities. The feature vectors for each entity pair are generated from these crime vectors using the Word2Vec model. The paper considers three different types of named entity pairs to facilitate the major crime data analysis task, and for each type, similarity between every pair of entities is measured using respective feature vectors. For each type of named entity pair, a separate weighted graph is generated with entity pairs as vertices and similarity score between them as the weight of the corresponding edge. Then, Infomap, a graph-based clustering algorithm, is applied to obtain optimal set of clusters of entity pairs and a representative entity pair of each cluster. Each cluster is labelled by the relationship, represented by the crime vector, of its representative entity pair. In reality, all the entity pairs in a cluster may not reflect contextual similarity with their representative entity pair. So the clusters are further partitioned into subclusters based on WordNet-based path similarity measure which makes the entity pairs in each subcluster more contextually similar compared to their original cluster. These subclusters provide us various statistical crime information over the time period. The method is experimented only using the crime reports related to crime against women in India. The experimental results demonstrate the effectiveness and superiority of the method compared to others for analysing the crime data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Information extraction from electronic medical documents: state of the art and future research directions

Article 08 November 2022

Mohamed Yassine Landolsi, Lobna Hlaoua & Lotfi Ben Romdhane

Review the role of artificial intelligence in detecting and preventing financial fraud using natural language processing

Article 26 July 2023

Pallavi Sood, Chetan Sharma, … Sumit Sakhuja

Graph based anomaly detection and description: a survey

Article 05 July 2014

Leman Akoglu, Hanghang Tong & Danai Koutra

References

Agichtein E, Gravano L (2000) Snowball: extracting relations from large plain-text collections. In: Proceedings of the fifth ACM conference on digital libraries
An J, Kim H (2018) A data analytics approach to the cybercrime underground economy. IEEE Access 6:26636–26652
Article Google Scholar
Arbelaitz O, Gurrutxaga I, Muguerza J, Prez JM, Perona I (2013) An extensive comparative study of cluster validity indices. Pattern Recognit 46(1):243–256
Article Google Scholar
Arulanandam R, Savarimuthu BTR, Purvis MA (2014) Extracting crime information from online newspaper articles. In: Second Australasian Web Conference (AWC 2014), vol 155, pp 31–38
Basili R, Giannone C, Del Vescovo C, Moschitti A, Naggar P (2009) Kernel-based relation extraction for crime investigation. In: AI*IA, Citeseer, pp 161–171
Bergmanis T, Goldwater S (2018) Context sensitive neural lemmatization with lematus. In: 16th annual conference of the North American chapter of the association for computational linguistics: human language technologies, pp 1391–1400
Bird S, Klein E, Loper E (2009) Natural language processing in python. O’Reilly Media
Brin S (1999) Extracting patterns and relations from the World Wide Web. In: International workshop on the world wide web and databases, pp 172–183
Caliski T, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat 3(1):1–27
MathSciNet MATH Google Scholar
Chau M, Xu JJ, Chen H (2002) Extracting meaningful entities from police narrative reports. In: Annual national conference on digital government research, pp 1–5
Chen H, Chung W, Xu JJ, Wang G, Qin Y, Chau M (2004) Crime data mining: a general framework and some examples. IEEE Comput Soc 37(4):50–56
Article Google Scholar
Cunningham H (2002) Gate, a general architecture for text engineering. Comput Humanit 36(2):223–254
Article Google Scholar
Das P, Das AK (2017) An application of strength pareto evolutionary algorithm for feature selection from crime data. In: 8th international conference on computing, communication and networking technologies, pp 1–6
Das P, Das AK (2018) Crime pattern analysis by identifying named entities and relation among entities. In: Advanced computational and communication paradigms, pp 75–84
Davies DL, Bouldin DW (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224–227
Article Google Scholar
Doddington G, Mitchell A, Przybocki M, Ramshaw L, Strassel S, Weischedel R (2004) The automatic content extraction (ace) program tasks, data, and evaluation. In: Proceedings of the fourth international conference on language resources and evaluation (LREC-2004), pp 837–840
Fellbaum C (1998) WordNet: an electronic lexical database. Bradford Books, Cambridge
Grishman R, Sundheim B (1996) Message understanding conference-6: a brief history. In: Proceedings of the 16th conference on computational linguistics, vol 1, pp 466–471
Hasegawa T, Sekine S, Grishman R (2004) Discovering relations among named entities from large corpora. In: Proceedings of the 42nd annual meeting on association for computational linguistics, p 415
Hasegawa T, Sekine S, Grishman R (2005) Unsupervised paraphrase acquisition via relation discovery. In: 11th annual meeting of the Japanese association for natural language processing
IRSIG-CNR (2002–2006) Astrea, information and communication for justice. Italian Research Council/Research Institute on Judicial Systems (IRSIG-CNR)
Karaa WBA, Gribâa N (2013) Information retrieval with porter stemmer: a new version for English. In: Advances in computational science, engineering and information technology, pp 243–254
Ku CH, Iriberri A, Leroy G (2008) Natural language processing and e-government: crime information extraction from heterogeneous data sources. In: Ninth international conference on digital government research, pp 162–170
Ku CH, Iriberri A, Leroy G (2008) Crime information extraction from police and witness narrative reports. In: IEEE conference on technologies for Homeland security, pp 193–198
Lin D, Pantel P (2001) Dirt—discovery of inference rules from text. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, pp 323–328
Loper E, Bird S (2002) Nltk: The natural language toolkit. In: Proceedings of the ACL-02 workshop on effective tools and methodologies for teaching natural language processing and computational linguistics, vol 1, pp 63–70
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. CoRR abs/1301.3781:1–12
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013b) Distributed representations of words and phrases and their compositionality. CoRR abs/1310.4546:1–9
Mohamed TP, Hruschka ER Jr, Mitchell TM (2011) Discovering relations between noun categories. In: Proceedings of the conference on empirical methods in natural language processing, Association for Computational Linguistics, EMNLP ’11, pp 1447–1455
Pinheiro V, Furtado V, Pequeno T, Nogueira D (2010) Natural language processing based on semantic inferentialism for extracting crime information from text. In: IEEE international conference on intelligence and security informatics (ISI), pp 19–24
Rendón E, Garcia R, Abundez I, Gutierrez C, Gasca E, Del Razo F, Gonzalez A (2008) Niva: a robust cluster validity. In: Proceedings of the 12th WSEAS international conference on communications, pp 241–248
Rosvall M (2009) Infomap. www.mapequation.org/code.html
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(Supplement C):53–65
Sekine S, Sudo K, Nobata C (2002) Extended named entity hierarchy. In: Third international conference on language resources and evaluation (LREC-2002), pp 1818–1824
Sekine S (2005) Automatic paraphrase discovery based on context and keywords between ne pairs. In: Proceedings of IWP, pp 4–6
Shabat H, Omar N, Rahem K (2014) Named entity recognition in crime using machine learning approach. In: Information retrieval technology, pp 280–288
Shabat HA, Omar N (2015) Named entity recognition in crime news documents using classifiers combination. Middle-East J Sci Res 23(6):1215–1221
Google Scholar
Syed Z, Viegas E (2010) A hybrid approach to unsupervised relation discovery based on linguistic analysis and semantic typing. In: First international workshop on formalisms and methodology for learning by reading, pp 105–113
Weir G, Anagnostou N (2007) Exploring newspapers: a case study in corpus analysis. In: ICTATLL Workshop
Zhang M, Su J, Wang D, Zhou G, Tan CL (2005) Discovering relations between named entities from a large raw corpus using tree similarity-based clustering. In: Second international joint conference on natural language processing, pp 378–389

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Shibpur, Howrah, 711103, India
Priyanka Das & Asit Kumar Das
Department of Computer Science and Engineering, Sri Sivani College of Engineering, Chilakapalem, Andhra Pradesh, India
Janmenjoy Nayak
Department of Communications Sciences, University of Teramo, Teramo, Italy
Danilo Pelusi

Authors

Priyanka Das
View author publications
You can also search for this author in PubMed Google Scholar
Asit Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar
Janmenjoy Nayak
View author publications
You can also search for this author in PubMed Google Scholar
Danilo Pelusi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Priyanka Das.

Ethics declarations

Conflict of interest

The authors declare that this manuscript has no conflict of interest with any other published source and has not been published previously (partly or in full). No data have been fabricated or manipulated to support our conclusion.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Das, P., Das, A.K., Nayak, J. et al. A framework for crime data analysis using relationship among named entities. Neural Comput & Applic 32, 7671–7689 (2020). https://doi.org/10.1007/s00521-019-04150-8

Download citation

Received: 12 November 2018
Accepted: 12 March 2019
Published: 22 March 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00521-019-04150-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework for crime data analysis using relationship among named entities

Abstract

Access this article

Similar content being viewed by others

Information extraction from electronic medical documents: state of the art and future research directions

Review the role of artificial intelligence in detecting and preventing financial fraud using natural language processing

Graph based anomaly detection and description: a survey

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A framework for crime data analysis using relationship among named entities

Abstract

Access this article

Similar content being viewed by others

Information extraction from electronic medical documents: state of the art and future research directions

Review the role of artificial intelligence in detecting and preventing financial fraud using natural language processing

Graph based anomaly detection and description: a survey

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation