Following the tragic events of September 11, 2001, researchers have been called upon to assume a larger role in the preservation of public safety and national security. One of the major challenges facing the intelligence and security community is monitoring of online communication mediums that are commonly used by terrorist groups. In this chapter, we addressed the online anonymity problem by successfully applying authorship analysis to English and Arabic extremist group web forum messages. The performance impact of different feature categories and techniques was evaluated across both languages. In order to facilitate enhanced writing style identification, a comprehensive list of online authorship features was incorporated. Additionally, an Arabic language model was created by adopting specific features and techniques to deal with the challenging linguistic characteristics of Arabic, including an elongation filter and a root clustering algorithm. A series of experiments were conducted to evaluate the efficacy of our models with results indicating a high level of success. Finally, a comparison of the English and Arabic language models and messages was made to aid the research community’s understanding of the dynamics of these group’s authorship tendencies.
KeywordsSupport Vector Machine Function Word Decision Tree Analysis Lexical Feature Arabic Word
This research was supported by the following grant: NSF, ITR-0326348, 2003–2005, “ITR: COPLINK Center for Intelligence and Security Informatics Research – A Crime Data Mining Approach to Developing Border Safe Research.” The authors also express their gratitude for the research assistance provided by fellow members of the Dark Web Project team in the Artificial Intelligence Lab, including Jialun Qin, Yilu Zhou, Greg Lai, and a couple of team members who wish to remain anonymous.
- Al-Fedaghi, S. S. and Al-Anzi, F. (1989) A new algorithm to generate Arabic root-pattern forms. Proceedings of the 11th National Computer Conference (KFUPM, 1989), Dhahran, Saudi Arabia.Google Scholar
- Beesley, K.B. (1996). Arabic Finite-State Morphological Analysis and Generation. Proceedings of COLING-96, 89–94.Google Scholar
- De Roeck, A. N. and Al-Fares, W. (2000). A morphologically sensitive clustering algorithm for identifying Arabic roots. In Proceedings ACL-2000 (ACL, 2000), Hong Kong, 2000.Google Scholar
- Larkey, L. S. and Connell, M. E. (2001). Arabic information retrieval at UMass in TREC-10 (TREC 2001), Gaithersburg, Maryland, (NIST 2001.)Google Scholar
- Peng, F., Schuurmans, D., Keselj, V., and Wang, S. (2003). Automated authorship attribution with character level language models. Paper presented at the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003).Google Scholar
- Zheng, R., Qin, Y., Huang, Z., and Chen, H. (2003). Authorship Analysis in Cybercrime Investigation. In Proceedings of the first NSF/NIJ Symposium, ISI 2003, Tucson, AZ, USA.Google Scholar