A Cybercrime Forensic Method for Chinese Web Information Authorship Analysis

  • Jianbin Ma
  • Guifa Teng
  • Yuxin Zhang
  • Yueli Li
  • Ying Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5477)


With the increasing popularization of the Internet, Internet services used as illegal purposes have become a serious problem. How to prevent these phenomena from happening has become a major concern for society. In this paper, a cybercrime forensic method for Chinese illegal web information authorship analysis was described. Various writing-style features including linguistic features and structural features were extracted. To classify the author of one web document, the SVM(support vector machine) algorithm was adopted to learn the author’s features. Experiments on Chinese blog, BBS and e-mail dataset gained satisfactory results. The accuracy of blog dataset for seven authors was 89.49%. The satisfactory results showed that it was feasible to put the method to cybercrime forensic application.


cybercrime forensic Chinese web information authorship analysis Support Vector Machine feature extraction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abbasi, A., Chen, H.: Writeprints: A Stylemetric Approach to Identity-Level Identification and Similarity Detection in Cyberspace. ACM Transactions on Information Systems 26(2) (2008)Google Scholar
  2. 2.
    Abbasi, A., Chen, H.: Visualizing authorship for identification. In: Proceeding of IEEE International Conference on Intelligence and Security Informatics, San Diego, pp. 60–71 (2006)Google Scholar
  3. 3.
    Abbasi, A., Chen, H.: Applying Authorship Analysis to Extremist- Group Web Forum Messages. IEEE Intelligence System 20(5), 67–75 (2005)CrossRefGoogle Scholar
  4. 4.
    Corney, M.: Analysing E-mail Text Authorship for Forensic Purpose. Australia, University of Software Engineering and Data Communications (2003)Google Scholar
  5. 5.
    Crain, C.: The Bard’s fingerprints, Lingua Franca, pp. 29–39 (1998)Google Scholar
  6. 6.
    De, Vel, C.: Mining E-mail Authorship. In: KDD 2000 Workshop on Text Mining, ACM International conference on knowledge Discovery and Data Mining, Boston, MA, USA (2000)Google Scholar
  7. 7.
    De, Vel, C., Anderson, A., Corney, M., Mohay, G.: Multi-Topic E-mail Authorship Attribution Forensics. In: ACM Conference on Computer Security - Workshop on Data Mining for Security Applications, Philadelphia, PA (2001)Google Scholar
  8. 8.
    De Vel, C., Anderson, A., Corney, M., Mohay, G.: Mining E-mail Content for Author Identification Forensic. SIGMOD Record 30(4), 55–64 (2001)CrossRefGoogle Scholar
  9. 9.
    De, Vel, C., Corney, M., Anderson, A., Mohay, G.: Language and gender author cohort analysis of e-mail for computer forensics. In: Proceeding of digital forensic research workshop, New York, USA (2002)Google Scholar
  10. 10.
    Diederich, J., Kindermann, J., Leopold, E., Paass, G.: Authorship Attribution with Support Vector Machines. Applied Intelligence (19), 109–123 (2003)Google Scholar
  11. 11.
    Elliot, W., Valenza, R.: Was the Earl of Oxford the true Shakespeare? Notes and Queries (38), 501–506 (1991)Google Scholar
  12. 12.
    Frantzeskou, G., Gritzalis, S., MacDonell, S.: Source Code Authorship Analysis for supporting the cybercrime investigation process. In: Proc. 1st International Conference on e-business and Telecommunications Networks (ICETE 2004), vol. 2, pp. 85–92 (2004)Google Scholar
  13. 13.
    Iqbal, F., Hadjidj, R., Fung, B.C.M., Debbabi, M.: A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digital Investigation 5(1), 42–51 (2008)CrossRefGoogle Scholar
  14. 14.
    Krsul, I., Spafford, E.: Authorship analysis: Identifying the author of a program. Computers and Security (16), 248–259 (1997)Google Scholar
  15. 15.
    Mosteller, F., Wallace, D.L.: Inference and Disputed Authorship. In: The Federalist, Addison-Wesley Publishing Company, Inc., Reading (1964)Google Scholar
  16. 16.
    Sallis, P., MacDonell, S., MacLennan, G., Gray, A., Kilgour, R.: Identified: Software Authorship Analysis with Case-Based Reasoning. In: Proc. Addendum Session Int. Conf. Neural Info. Processing and Intelligent Info. Systems, pp. 53–56 (1997)Google Scholar
  17. 17.
    Tsuboi, Y.: Authorship Identification for Heterogeneous Documents. Nara Institute of Science and Technology, University of Information Science (2002) (Japanese)Google Scholar
  18. 18.
    Zheng, R., Li, J., Huang, Z., Chen, H.: A framework for authorship analysis of online messages: Writing-style features and techniques. Journal of the American Society for Information Science and Technology 57(3), 378–393 (2006)CrossRefGoogle Scholar
  19. 19.
    Zheng, R., Qin, Y., Huang, Z., Chen, H.: Authorship analysis in cybercrime investigation. In: Proceedings of the first international symposium on intelligence and security informatics, Tucson AZ USA, pp. 59–73 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Jianbin Ma
    • 1
  • Guifa Teng
    • 1
  • Yuxin Zhang
    • 1
  • Yueli Li
    • 1
  • Ying Li
    • 2
  1. 1.College of Information Science and TechnologyAgricultural University of HebeiBaodingChina
  2. 2.College of SciencesAgricultural University of HebeiBaodingChina

Personalised recommendations