Abstract
It is easy to hide the true identity of the author of an email. The author’s actual name, email address, etc. can be changed arbitrarily to deceive an email receiver. For example, a sender can change his/her identity in the email header to send different emails to various recipients. Therefore, in this paper, we investigate techniques for authorship similarity detection from the text content of a short length, topic-free email. 150 stylistic cues are identified for this problem. A frequent pattern and machine learning based method is proposed. Extensive experiment results are also presented for the Enron email data set.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Abbasi, A., Chen, H.: Visualizing authorship for identification. In: Mehrotra, S., Zeng, D.D., Chen, H., Thuraisingham, B., Wang, F.-Y. (eds.) ISI 2006. LNCS, vol. 3975, pp. 60–71. Springer, Heidelberg (2006)
Peng, F., Schuurmans, D., Deselj, V., Wang, S.: Automated authorship attribution with character level language models. In: Processings of the 10th Conference of European Chapter of the Association for Computational Linguistics (2003)
de Vel, O.: Mining E-mail Authorship. In: Proceedings of KDD-2000 Workshop on Text mining, Boston, U.S.A (2000)
Zheng, R., Li, J., Chen, H., Huang, Z.: A Framework for Authorship Identification of Online Messages: Writing-Style Features and Classification Techniques. Jounal of the American society for Information and Technology 57(3), 378–393 (2006)
Corney, M.W., Anderson, A.M., Mohay, G.M., de Vel, O.: Identifying the authors of suspect email (October 2008), http://eprints.qut.edu.au/archive/00008021/
Goodman, R., Hahn, M., Marella, M., Ojar, C., Westcott, S.: The use of stylometry for Email author identification:A feasiblity study (2007), http://utopia.csis.pace.edu/cs691/2007-2008/team2/docs/7.TEAM2-TechnicalPaper.061213-Final.pdf
Cheng, N., Chen, X., Chandramouli, R., Subbalakshmi, K.P.: Gender identification from E-mails. In: CIDM, pp. 154–158 (2009)
Abbasi, A., Chen, H.: Writeprints: A Stylometric Approach to Identity-Level Identification and Similarity Detection in Cyberspace. ACM Transactions on Information Systems 26(2), 7:1-7:29 (2008)
Linguistic Inquiry and Word Count (June 2007), http://www.liwc.net/
Iqbal, F., Hadjidj, R., Fung, B.C.M., Debbabi, M.: A novel approach of mining write-prints for authorship attribution in e-mail forensics. Digital investigation 5, S42–S51 (2008)
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. ACM SOGMOD Record 22(2), 207–216 (1993)
He, H., Garcia, E.A.: Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21(9) (September 2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, X., Hao, P., Chandramouli, R., Subbalakshmi, K.P. (2011). Authorship Similarity Detection from Email Messages. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2011. Lecture Notes in Computer Science(), vol 6871. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23199-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-23199-5_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23198-8
Online ISBN: 978-3-642-23199-5
eBook Packages: Computer ScienceComputer Science (R0)