Abstract
The aim of this paper is to investigate the feasibility of predicting the gender of a text document’s author using linguistic evidence. For this purpose, term- and style-based classification techniques are evaluated over a large collection of chat messages. Prediction accuracies up to 84.2% are achieved, illustrating the applicability of these techniques to gender prediction. Moreover, the reverse problem is exploited, and the effect of gender on the writing style is discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Love, H.: Attributing Authorship: An Introduction. Cambridge University Press, Cambridge (2002)
Corney, M.W.: Analysing E-mail Text Authorship for Forensic Purposes. M.S. Thesis. Queensland University of Technology (2003)
Holmes, D.I.: Analysis of Literary Style - A Review. Journal of the Royal Statistical Society 148(4), 328–341 (1985)
Elliot, W.E.Y., Valenza, R.J.: Was the Earl of Oxford the True Shakespeare? A Computer Aided Analysis. Notes and Queries 236, 501–506 (1991)
Merriam, T., Matthews, R.: Neural Computation in Stylometry II: An Application to the Works of Shakespeare and Marlowe. Literary and Linguistic Computing 9, 1–6 (1994)
Mosteller, F., Wallace, D.L.: Inference and Disputed Authorship: The Federalist. Addison-Wesley, Reading (1964)
Holmes, I., Forstyh, R.: The Federalist Revisited: New Directions in Authorship Attribution. Literary and Linguistic Computing 10(2), 111–127 (1995)
Tweedie, F.J., Singh, S., Holmes, D.I.: Neural Network Applications in Stylometry: The Federalist Papers. Computers and the Humanities 30(1), 1–10 (1996)
Patton, J.M., Can, F.: A Stylometric Analysis of Yasar Kemal’s Ince Memed Tetralogy. Computers and the Humanities 38(4), 457–467 (2004)
Graham, N., Hirst, G., Marthi, B.: Segmenting Documents by Stylistic Character. Natural Language Engineering 11(4), 397–415 (2005)
de Vel, O., Corney, M., Anderson, A., Mohay, G.: Language and Gender Author Cohort Analysis of E-mail for Computer Forensics. In: Second Digital Forensics Research Workshop (2002)
Koppel, M., Argamon, S., Shimoni, A.R.: Automatically Categorizing Written Texts by Author Gender. Literary & Linguistic Computing 17(4), 401–412 (2002)
Kessler, B., Nunberg, G., Schutze, H.: Automatic Detection of Text Genre. In: Proceedings of the 35th Annual Meeting on Association for Computational Linguistics, pp. 32–38 (1997)
Spafford, E.H., Weeber, S.A.: Software Forensics: Can We Track Code to Its Authors? Computers and Security 12, 585–595 (1993)
Rudman, J.: The State of Authorship Attribution Studies: Some Problems and Solutions. Computers and the Humanities 31(4), 351–365 (1998)
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Holmes, D.I.: Authorship Attribution. Computers and the Humanities 28(2), 87–106 (1994)
Liu, A.Y.C.: The Effect of Oversampling and Undersampling on Classifying Imbalanced Text Datasets. M.S. Thesis. University of Texas at Austin (2004)
Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Data Sets: One-sided Sampling. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186 (1997)
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420 (1997)
Cambazoglu, B.B., Aykanat, C.: Harbinger Machine Learning Toolkit Manual. Technical Report BU-CE-0503, Bilkent University, Computer Engineering Department, Ankara (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kucukyilmaz, T., Cambazoglu, B.B., Aykanat, C., Can, F. (2006). Chat Mining for Gender Prediction. In: Yakhno, T., Neuhold, E.J. (eds) Advances in Information Systems. ADVIS 2006. Lecture Notes in Computer Science, vol 4243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11890393_29
Download citation
DOI: https://doi.org/10.1007/11890393_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46291-0
Online ISBN: 978-3-540-46292-7
eBook Packages: Computer ScienceComputer Science (R0)