Detecting Predatory Behaviour from Online Textual Chats
This paper presents a novel methodology for learning the behavioural profiles of sexual predators by using state-of-the-art machine learning and computational linguistics methods. The presented methodology targets at distinguishing between predatory and non-predatory conversations and is evaluated in real-world data. All the text fragments within a malicious chat is not of predatory nature. Thus it is necessary to distinguish the predatory fragments from non-predatory ones. This distinction is made by implementing the notion of n-grams which captures predatory sequences from conversations. The paper uses as features both content words and stylistic features within conversations. The content words are weighed using tf-idf measure. Experiments show that content words alone are not enough to make distinction between predatory and non-predatory chats. The implementation of various stylistic features however improves the performance of the system.
Keywordsnatural language processing svm text classification offensive chats
Unable to display preview. Download preview PDF.
- 1.Grover, V., Adderley, R., Bramer, M.: Review of current crime prediction techniques. In: Ellis, A.T.R., Allen, T. (eds.) Applications and Innovations in Intelligent Systems XIV, pp. 233–247 (2007)Google Scholar
- 2.Mena, J.: Investigative Data Mining for Security and Criminal Detection. Academic Pr. Inc. (April 2003)Google Scholar
- 6.Kohonen, T.: Self-organized formation of topologically correct feature maps, pp. 509–521 (1988)Google Scholar
- 10.Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL, pp. 252–259 (2003)Google Scholar
- 11.Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, pp. 363–370. Association for Computational Linguistics, Stroudsburg (2005), http://dx.doi.org/10.3115/1219840.1219885 CrossRefGoogle Scholar
- 12.Moschitti, A., Quarteroni, S., Basili, R., Manandhar, S.: Exploiting syntactic and shallow semantic kernels for question answer classification. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2007)Google Scholar
- 13.Joshi, M., Pedersen, T., Maclin, R., Pakhomov, S.: Kernel methods for word sense disambiguation and acronym expansion. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 2, pp. 1879–1880. AAAI Press (2006), http://portal.acm.org/citation.cfm?id=1597348.1597488
- 14.Lee, Y.K., Ng, H.T., Chia, T.K.: Supervised word sense disambiguation with support vector machines and multiple knowledge sources. In: Mihalcea, R., Edmonds, P. (eds.) Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 137–140. Association for Computational Linguistics, Barcelona (2004)Google Scholar
- 15.Zelenko, D., Aone, C., Richardella, A.: Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083–1106 (2003), http://portal.acm.org/citation.cfm?id=944919.944964 MathSciNetzbMATHGoogle Scholar
- 16.Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011) software, http://www.csie.ntu.edu.tw/~cjlin/libsvm