Advertisement

Dark Web pp 153-169 | Cite as

Authorship Analysis

  • Hsinchun ChenEmail author
Chapter
Part of the Integrated Series in Information Systems book series (ISIS, volume 30)

Abstract

Following the tragic events of September 11, 2001, researchers have been called upon to assume a larger role in the preservation of public safety and national security. One of the major challenges facing the intelligence and security community is monitoring of online communication mediums that are commonly used by terrorist groups. In this chapter, we addressed the online anonymity problem by successfully applying authorship analysis to English and Arabic extremist group web forum messages. The performance impact of different feature categories and techniques was evaluated across both languages. In order to facilitate enhanced writing style identification, a comprehensive list of online authorship features was incorporated. Additionally, an Arabic language model was created by adopting specific features and techniques to deal with the challenging linguistic characteristics of Arabic, including an elongation filter and a root clustering algorithm. A series of experiments were conducted to evaluate the efficacy of our models with results indicating a high level of success. Finally, a comparison of the English and Arabic language models and messages was made to aid the research community’s understanding of the dynamics of these group’s authorship tendencies.

Keywords

Support Vector Machine Function Word Decision Tree Analysis Lexical Feature Arabic Word 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

This research was supported by the following grant: NSF, ITR-0326348, 2003–2005, “ITR: COPLINK Center for Intelligence and Security Informatics Research – A Crime Data Mining Approach to Developing Border Safe Research.” The authors also express their gratitude for the research assistance provided by fellow members of the Dark Web Project team in the Artificial Intelligence Lab, including Jialun Qin, Yilu Zhou, Greg Lai, and a couple of team members who wish to remain anonymous.

References

  1. Abbasi, A. and Chen, H. (2005). Identification and Comparison of Extremist-Group Web Forum Messages using Authorship Analysis. IEEE Intelligent Systems, 20(5), 67–75.CrossRefGoogle Scholar
  2. Al-Fedaghi, S. S. and Al-Anzi, F. (1989) A new algorithm to generate Arabic root-pattern forms. Proceedings of the 11th National Computer Conference (KFUPM, 1989), Dhahran, Saudi Arabia.Google Scholar
  3. Beesley, K.B. (1996). Arabic Finite-State Morphological Analysis and Generation. Proceedings of COLING-96, 89–94.Google Scholar
  4. De Roeck, A. N. and Al-Fares, W. (2000). A morphologically sensitive clustering algorithm for identifying Arabic roots. In Proceedings ACL-2000 (ACL, 2000), Hong Kong, 2000.Google Scholar
  5. De Vel, O., Anderson, A., Corney, M., and Mohay, G. (2001). Mining E-mail content for author identification forensics. SIGMOD Record, 30(4), 55–64.CrossRefGoogle Scholar
  6. Hmeidi, I., Kanaan, G. and Evens, M. (1997). Design and Implementation of Automatic Indexing for Information Retrieval with Arabic Documents. Journal of the American Society for Information Science, 48(10), 867–881.CrossRefGoogle Scholar
  7. Larkey, L. S. and Connell, M. E. (2001). Arabic information retrieval at UMass in TREC-10 (TREC 2001), Gaithersburg, Maryland, (NIST 2001.)Google Scholar
  8. Palmer, J.W. and Griffith, D.A. (1998). An Emerging Model of Web Site Design for Marketing. Communications of the ACM, 41(3), 44–51.CrossRefGoogle Scholar
  9. Peng, F., Schuurmans, D., Keselj, V., and Wang, S. (2003). Automated authorship attribution with character level language models. Paper presented at the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003).Google Scholar
  10. Stamatatos, E., Fakotakis, N., and Kokkinakis, G. (2001). Computer-based Authorship Attribution without Lexical Measures. Computers and the Humanities, 35(2), 193–214.CrossRefGoogle Scholar
  11. Zheng, R., Qin, Y., Huang, Z., and Chen, H. (2003). Authorship Analysis in Cybercrime Investigation. In Proceedings of the first NSF/NIJ Symposium, ISI 2003, Tucson, AZ, USA.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Management Information SystemsUniversity of ArizonaTusconUSA

Personalised recommendations