N-Gram-Based Recognition of Threatening Tweets
In this paper, we investigate to what degree it is possible to recognize threats in Dutch tweets. We attempt threat recognition on the basis of only the single tweet (without further context) and using only very simple recognition features, namely n-grams. We present two different methods of n-gram-based recognition, one based on manually constructed n-gram patterns and the other on machine learned patterns. Our evaluation is not restricted to precision and recall scores, but also looks into the difference in yield of the two methods, considering either combination or means that may help refine both methods individually.
Keywordssocial media text mining text classification manually constructed rules machine learning
Unable to display preview. Download preview PDF.
- 1.The Law Dictionary. Featuring Black’s Law Dictionary Free Online Legal Dictionary, 2nd edn., http://thelawdictionary.org/search2/?cx=partner-pub-4620319056007131%3A7293005414&cof=FORID%3A11&ie=UTF-8&q=threat&x=6&y=6
- 2.Canadian Criminal Code, http://www.rcmp-grc.gc.ca/qc/pub/cybercrime/cybercrime-eng.htm
- 3.Tjong Kim Sang, E.: Het Gebruik van Twitter voor Taalkundig Onderzoek. TABU: Bulletin Voor Taalwetenschap 39(1/2), 62–72 (2011)Google Scholar
- 4.van Halteren, H., Oostdijk, N.: Towards Identifying Normal Forms for Various Word Form Spellings on Twitter. CLIN Journal 2, 2–22 (2012), http://www.clinjournal.org/sites/default/files/1VanHalteren2012_0.pdf Google Scholar
- 5.van Halteren, H.: Linguistic Profiling for Author Recognition and Verification. In: Scott, D., Daelemans, W., Walker, M.A. (eds.) Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, July 21-26. ACL, Barcelona (2004)Google Scholar