Abstract
The proposed work demonstrates a rough set based feature selection scheme for selecting crime features from online newspaper reports of crime performed against women in India. Only the verbs present in the crime reports are considered as the extracted features for crime analysis task. To select only the distinct verbs, all the words with common synonyms are identified and replaced by a single word. Most often the set of features contains the relevant as well as many irrelevant features. Hence, for any classification task, it is highly essential to select only the relevant features for accurate classification. In the proposed work, the rough set theory based relative indiscernibility relation is used to measure the similarity score between two features relative to the crime type. Then a weighted undirected graph has been generated that comprises the features as nodes and the inverse similarity score between two features as the weight of the corresponding edge. Prim’s algorithm is applied to obtain a minimal spanning tree. Finally, a feature selection algorithm has been developed that selects the highest degree node and removes it from the tree iteratively until the modified graph becomes a null graph. The selected nodes are considered as the important features sufficient for crime reports categorization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pawlak, Z.: Rough set theory and its applications to data analysis. Cybern. Syst. 29(7), 661–688 (1998)
Hu, X.T., Lin, T.Y., Han, J.: A new rough sets model based on database systems. In: Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing, pp. 114–121 (2003)
Zhang, M., Yao, J.T.: A rough sets based approach to feature selection. In: IEEE Annual Meeting of the Fuzzy Information, 2004. Processing NAFIPS ’04, vol. 1, pp. 434–439 (2004)
Song, Q., Ni, J., Wang, G.: A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans. Knowl. Data Eng. 25(1), 1–14 (2013)
Yaswanth Kumar Alapati, K., Sindhu, S.S.: Relevant feature selection from high-dimensional data using mst based clustering. Int. J. Emerg. Trends Sci. Technol. 2(3), 1997–2001 (2015)
Singh, B., Sankhwar, J.S., Vyas, O.P.: Optimization of feature selection method for high dimensional data using fisher score and minimum spanning tree. In: 2014 Annual IEEE India Conference (INDICON), pp. 1–6 (2014)
Taha, K., Yoo, P.D.: Using the spanning tree of a criminal network for identifying its leaders. IEEE Trans. Inf. Forensics Secur. 12(2), 445–453 (2017)
Das, P., Das, A.K.: An application of strength pareto evolutionary algorithm for feature selection from crime data. In: 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–6 (2017)
Loper, E., Bird, S.: NLTK: The natural language toolkit. In: Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. ETMTNLP ’02, pp. 63–70 (2002)
Csardi, G., Nepusz, T.: The igraph software package for complex network research. InterJ. Complex Syst. 1695 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Das, P., Das, A.K. (2020). Crime Feature Selection Constructing Weighted Spanning Tree. In: Das, A., Nayak, J., Naik, B., Pati, S., Pelusi, D. (eds) Computational Intelligence in Pattern Recognition. Advances in Intelligent Systems and Computing, vol 999. Springer, Singapore. https://doi.org/10.1007/978-981-13-9042-5_33
Download citation
DOI: https://doi.org/10.1007/978-981-13-9042-5_33
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9041-8
Online ISBN: 978-981-13-9042-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)