Detecting Predatory Behaviour from Online Textual Chats

  • Suraj Jung Pandey
  • Ioannis Klapaftis
  • Suresh Manandhar
Part of the Communications in Computer and Information Science book series (CCIS, volume 287)


This paper presents a novel methodology for learning the behavioural profiles of sexual predators by using state-of-the-art machine learning and computational linguistics methods. The presented methodology targets at distinguishing between predatory and non-predatory conversations and is evaluated in real-world data. All the text fragments within a malicious chat is not of predatory nature. Thus it is necessary to distinguish the predatory fragments from non-predatory ones. This distinction is made by implementing the notion of n-grams which captures predatory sequences from conversations. The paper uses as features both content words and stylistic features within conversations. The content words are weighed using tf-idf measure. Experiments show that content words alone are not enough to make distinction between predatory and non-predatory chats. The implementation of various stylistic features however improves the performance of the system.


natural language processing svm text classification offensive chats 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Grover, V., Adderley, R., Bramer, M.: Review of current crime prediction techniques. In: Ellis, A.T.R., Allen, T. (eds.) Applications and Innovations in Intelligent Systems XIV, pp. 233–247 (2007)Google Scholar
  2. 2.
    Mena, J.: Investigative Data Mining for Security and Criminal Detection. Academic Pr. Inc. (April 2003)Google Scholar
  3. 3.
    Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998), CrossRefGoogle Scholar
  4. 4.
    Johnson, S.D., Bowers, K.J.: The burglary as clue to the future: The beginnings of prospective hot-spotting. European Journal of Criminology 1(2), 237–255 (2004)CrossRefGoogle Scholar
  5. 5.
    Adderley, R.: The Use of Data Mining Techniques in Operational Crime Fighting. In: Chen, H., Moore, R., Zeng, D.D., Leavitt, J. (eds.) ISI 2004. LNCS, vol. 3073, pp. 418–425. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Kohonen, T.: Self-organized formation of topologically correct feature maps, pp. 509–521 (1988)Google Scholar
  7. 7.
    Bache, R., Crestani, F.: Estimating real-valued characteristics of criminals from their recorded crimes. In: CIKM 2008: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 1385–1386. ACM, New York (2008)CrossRefGoogle Scholar
  8. 8.
    Bache, R., Crestani, F., Canter, D., Youngs, D.: A language modelling approach to linking criminal styles with offender characteristics. Data & Knowledge Engineering 69(3), 303–315 (2010)CrossRefGoogle Scholar
  9. 9.
    de Vel, O., Anderson, A., Corney, M., Mohay, G.: Mining e-mail content for author identification forensics. SIGMOD Rec. 30(4), 55–64 (2001)CrossRefGoogle Scholar
  10. 10.
    Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL, pp. 252–259 (2003)Google Scholar
  11. 11.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, pp. 363–370. Association for Computational Linguistics, Stroudsburg (2005), CrossRefGoogle Scholar
  12. 12.
    Moschitti, A., Quarteroni, S., Basili, R., Manandhar, S.: Exploiting syntactic and shallow semantic kernels for question answer classification. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics (2007)Google Scholar
  13. 13.
    Joshi, M., Pedersen, T., Maclin, R., Pakhomov, S.: Kernel methods for word sense disambiguation and acronym expansion. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 2, pp. 1879–1880. AAAI Press (2006),
  14. 14.
    Lee, Y.K., Ng, H.T., Chia, T.K.: Supervised word sense disambiguation with support vector machines and multiple knowledge sources. In: Mihalcea, R., Edmonds, P. (eds.) Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 137–140. Association for Computational Linguistics, Barcelona (2004)Google Scholar
  15. 15.
    Zelenko, D., Aone, C., Richardella, A.: Kernel methods for relation extraction. J. Mach. Learn. Res. 3, 1083–1106 (2003), MathSciNetzbMATHGoogle Scholar
  16. 16.
    Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011) software,

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Suraj Jung Pandey
    • 1
  • Ioannis Klapaftis
    • 1
  • Suresh Manandhar
    • 1
  1. 1.University of YorkHeslingtonUK

Personalised recommendations