Skip to main content

Authorship Similarity Detection from Email Messages

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6871))

Abstract

It is easy to hide the true identity of the author of an email. The author’s actual name, email address, etc. can be changed arbitrarily to deceive an email receiver. For example, a sender can change his/her identity in the email header to send different emails to various recipients. Therefore, in this paper, we investigate techniques for authorship similarity detection from the text content of a short length, topic-free email. 150 stylistic cues are identified for this problem. A frequent pattern and machine learning based method is proposed. Extensive experiment results are also presented for the Enron email data set.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abbasi, A., Chen, H.: Visualizing authorship for identification. In: Mehrotra, S., Zeng, D.D., Chen, H., Thuraisingham, B., Wang, F.-Y. (eds.) ISI 2006. LNCS, vol. 3975, pp. 60–71. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Peng, F., Schuurmans, D., Deselj, V., Wang, S.: Automated authorship attribution with character level language models. In: Processings of the 10th Conference of European Chapter of the Association for Computational Linguistics (2003)

    Google Scholar 

  3. de Vel, O.: Mining E-mail Authorship. In: Proceedings of KDD-2000 Workshop on Text mining, Boston, U.S.A (2000)

    Google Scholar 

  4. Zheng, R., Li, J., Chen, H., Huang, Z.: A Framework for Authorship Identification of Online Messages: Writing-Style Features and Classification Techniques. Jounal of the American society for Information and Technology 57(3), 378–393 (2006)

    Article  Google Scholar 

  5. Corney, M.W., Anderson, A.M., Mohay, G.M., de Vel, O.: Identifying the authors of suspect email (October 2008), http://eprints.qut.edu.au/archive/00008021/

  6. Goodman, R., Hahn, M., Marella, M., Ojar, C., Westcott, S.: The use of stylometry for Email author identification:A feasiblity study (2007), http://utopia.csis.pace.edu/cs691/2007-2008/team2/docs/7.TEAM2-TechnicalPaper.061213-Final.pdf

  7. Cheng, N., Chen, X., Chandramouli, R., Subbalakshmi, K.P.: Gender identification from E-mails. In: CIDM, pp. 154–158 (2009)

    Google Scholar 

  8. Abbasi, A., Chen, H.: Writeprints: A Stylometric Approach to Identity-Level Identification and Similarity Detection in Cyberspace. ACM Transactions on Information Systems 26(2), 7:1-7:29 (2008)

    Article  Google Scholar 

  9. Linguistic Inquiry and Word Count (June 2007), http://www.liwc.net/

  10. Iqbal, F., Hadjidj, R., Fung, B.C.M., Debbabi, M.: A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digital investigation 5, S42–S51 (2008)

    Article  Google Scholar 

  11. Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. ACM SOGMOD Record 22(2), 207–216 (1993)

    Article  Google Scholar 

  12. He, H., Garcia, E.A.: Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21(9) (September 2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, X., Hao, P., Chandramouli, R., Subbalakshmi, K.P. (2011). Authorship Similarity Detection from Email Messages. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23199-5_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23198-8

  • Online ISBN: 978-3-642-23199-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics