Skip to main content

Chat Mining for Gender Prediction

  • Conference paper
Advances in Information Systems (ADVIS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4243))

Included in the following conference series:

Abstract

The aim of this paper is to investigate the feasibility of predicting the gender of a text document’s author using linguistic evidence. For this purpose, term- and style-based classification techniques are evaluated over a large collection of chat messages. Prediction accuracies up to 84.2% are achieved, illustrating the applicability of these techniques to gender prediction. Moreover, the reverse problem is exploited, and the effect of gender on the writing style is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Love, H.: Attributing Authorship: An Introduction. Cambridge University Press, Cambridge (2002)

    Book  Google Scholar 

  2. Corney, M.W.: Analysing E-mail Text Authorship for Forensic Purposes. M.S. Thesis. Queensland University of Technology (2003)

    Google Scholar 

  3. Holmes, D.I.: Analysis of Literary Style - A Review. Journal of the Royal Statistical Society 148(4), 328–341 (1985)

    Google Scholar 

  4. Elliot, W.E.Y., Valenza, R.J.: Was the Earl of Oxford the True Shakespeare? A Computer Aided Analysis. Notes and Queries 236, 501–506 (1991)

    Google Scholar 

  5. Merriam, T., Matthews, R.: Neural Computation in Stylometry II: An Application to the Works of Shakespeare and Marlowe. Literary and Linguistic Computing 9, 1–6 (1994)

    Article  Google Scholar 

  6. Mosteller, F., Wallace, D.L.: Inference and Disputed Authorship: The Federalist. Addison-Wesley, Reading (1964)

    MATH  Google Scholar 

  7. Holmes, I., Forstyh, R.: The Federalist Revisited: New Directions in Authorship Attribution. Literary and Linguistic Computing 10(2), 111–127 (1995)

    Article  Google Scholar 

  8. Tweedie, F.J., Singh, S., Holmes, D.I.: Neural Network Applications in Stylometry: The Federalist Papers. Computers and the Humanities 30(1), 1–10 (1996)

    Article  Google Scholar 

  9. Patton, J.M., Can, F.: A Stylometric Analysis of Yasar Kemal’s Ince Memed Tetralogy. Computers and the Humanities 38(4), 457–467 (2004)

    Article  Google Scholar 

  10. Graham, N., Hirst, G., Marthi, B.: Segmenting Documents by Stylistic Character. Natural Language Engineering 11(4), 397–415 (2005)

    Article  Google Scholar 

  11. de Vel, O., Corney, M., Anderson, A., Mohay, G.: Language and Gender Author Cohort Analysis of E-mail for Computer Forensics. In: Second Digital Forensics Research Workshop (2002)

    Google Scholar 

  12. Koppel, M., Argamon, S., Shimoni, A.R.: Automatically Categorizing Written Texts by Author Gender. Literary & Linguistic Computing 17(4), 401–412 (2002)

    Article  Google Scholar 

  13. Kessler, B., Nunberg, G., Schutze, H.: Automatic Detection of Text Genre. In: Proceedings of the 35th Annual Meeting on Association for Computational Linguistics, pp. 32–38 (1997)

    Google Scholar 

  14. Spafford, E.H., Weeber, S.A.: Software Forensics: Can We Track Code to Its Authors? Computers and Security 12, 585–595 (1993)

    Article  Google Scholar 

  15. Rudman, J.: The State of Authorship Attribution Studies: Some Problems and Solutions. Computers and the Humanities 31(4), 351–365 (1998)

    Article  MathSciNet  Google Scholar 

  16. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  17. Holmes, D.I.: Authorship Attribution. Computers and the Humanities 28(2), 87–106 (1994)

    Article  Google Scholar 

  18. Liu, A.Y.C.: The Effect of Oversampling and Undersampling on Classifying Imbalanced Text Datasets. M.S. Thesis. University of Texas at Austin (2004)

    Google Scholar 

  19. Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Data Sets: One-sided Sampling. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186 (1997)

    Google Scholar 

  20. Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420 (1997)

    Google Scholar 

  21. Cambazoglu, B.B., Aykanat, C.: Harbinger Machine Learning Toolkit Manual. Technical Report BU-CE-0503, Bilkent University, Computer Engineering Department, Ankara (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kucukyilmaz, T., Cambazoglu, B.B., Aykanat, C., Can, F. (2006). Chat Mining for Gender Prediction. In: Yakhno, T., Neuhold, E.J. (eds) Advances in Information Systems. ADVIS 2006. Lecture Notes in Computer Science, vol 4243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11890393_29

Download citation

  • DOI: https://doi.org/10.1007/11890393_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-46291-0

  • Online ISBN: 978-3-540-46292-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics