Skip to main content

Author Profiling: Predicting Gender from Document

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 37))

Abstract

As the Internet is aging, a massive amount of data is being created on the web, out of which mostly is text. Therefore, authorship of the contents and prediction of characteristics of the author is becoming a new domain of data analytics making Author Profiling a research area with huge scope of possibilities and outcomes. The ability to describe the features or traits of an author has a key application in many security and forensic areas. The PAN labs provide a platform for scholars by organizing author profiling tasks, for example, language, gender prediction, etc. In this paper, we are attempting to predict gender of a particular author, for which we have considered English dataset of PAN 2017.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. H.N. Tran, T. Huynh, T. Do, Author name disambiguation by using deep neural network, in Asian Conference on Intelligent Information and Database Systems (Springer, Cham, 2014), pp. 123–132

    Chapter  Google Scholar 

  2. D. Bagnall, Author identification using multi-headed recurrent neural networks (2015). arXiv:1506.04891

  3. Jonathan Schler, Moshe Koppel, Shlomo Argamon, James W. Pennebaker, Effects of age and gender on blogging, in AAAI spring symposium: Computational approaches to analyzing weblogs, vol. 6, (2006), pp. 199–205

    Google Scholar 

  4. K. Santosh, R. Bansal, M. Shekhar, V. Varma, Author profiling: predicting age and gender from blogs. Notebook for PAN at CLEF (2013), pp. 119–124

    Google Scholar 

  5. M. Koppel, S. Argamon, A.R. Shimoni, Automatically categorizing written texts by author gender. Lit. Linguist. Comput. 17.4, 401–412 (2002)

    Article  Google Scholar 

  6. C. Alexandre, J. Balsa, Client profiling for an anti-money laundering system (2015). arXiv:1510.00878

  7. J. Hong, C. Mattmann, P. Ramirez, Ensemble maximum entropy classification and linear regression for author age prediction (2016). arXiv:1610.00852

  8. R. Shetty, B. Schiele, M. Fritz, A4NT: author attribute anonymity by adversarial training of neural machine translation, in 27th USENIX Security Symposium. USENIX Association (2018), pp. 1633–1650

    Google Scholar 

  9. S. Mechti, M. Jaoua, L.H. Belguith, R. Faiz, Author profiling using style-based features, in Proceedings of CLEF (2013)

    Google Scholar 

  10. D. Nazareth, K. Asnani, O. Rodrigues, Author-profile system development based on software reuse of open source components, in Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014 (Springer, Cham, 2015), pp. 629–636

    Google Scholar 

  11. A.M. Ciobanu, M. Zampieri, S. Malmasi, L.P. Dinu, Including dialects and language varieties in author profiling (2017). arXiv:1707.00621

  12. F. Rangel, P. Rosso, M. Potthast, B. Stein, Overview of the 5th author profiling task at PAN 2017: gender and language variety identification in twitter, in CLEF 2017 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings, ed L. Cappellato, N. Ferro, L. Goeuriot, T. Mandl, vol. 1866 (CEUR-WS.org, 2017)

    Google Scholar 

  13. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, C.-J. Lin, LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    Google Scholar 

  14. R. Akerkar, P.S. Sajja, Basic learning algorithms, in Intelligent Techniques for Data Science (Springer, Cham, 2016), pp. 53–93

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sunakshi Mamgain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mamgain, S., Balabantaray, R.C., Das, A.K., Kumar, S. (2020). Author Profiling: Predicting Gender from Document. In: Borah, S., Emilia Balas, V., Polkowski, Z. (eds) Advances in Data Science and Management. Lecture Notes on Data Engineering and Communications Technologies, vol 37. Springer, Singapore. https://doi.org/10.1007/978-981-15-0978-0_10

Download citation

Publish with us

Policies and ethics