Abstract
As the Internet is aging, a massive amount of data is being created on the web, out of which mostly is text. Therefore, authorship of the contents and prediction of characteristics of the author is becoming a new domain of data analytics making Author Profiling a research area with huge scope of possibilities and outcomes. The ability to describe the features or traits of an author has a key application in many security and forensic areas. The PAN labs provide a platform for scholars by organizing author profiling tasks, for example, language, gender prediction, etc. In this paper, we are attempting to predict gender of a particular author, for which we have considered English dataset of PAN 2017.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
H.N. Tran, T. Huynh, T. Do, Author name disambiguation by using deep neural network, in Asian Conference on Intelligent Information and Database Systems (Springer, Cham, 2014), pp. 123–132
D. Bagnall, Author identification using multi-headed recurrent neural networks (2015). arXiv:1506.04891
Jonathan Schler, Moshe Koppel, Shlomo Argamon, James W. Pennebaker, Effects of age and gender on blogging, in AAAI spring symposium: Computational approaches to analyzing weblogs, vol. 6, (2006), pp. 199–205
K. Santosh, R. Bansal, M. Shekhar, V. Varma, Author profiling: predicting age and gender from blogs. Notebook for PAN at CLEF (2013), pp. 119–124
M. Koppel, S. Argamon, A.R. Shimoni, Automatically categorizing written texts by author gender. Lit. Linguist. Comput. 17.4, 401–412 (2002)
C. Alexandre, J. Balsa, Client profiling for an anti-money laundering system (2015). arXiv:1510.00878
J. Hong, C. Mattmann, P. Ramirez, Ensemble maximum entropy classification and linear regression for author age prediction (2016). arXiv:1610.00852
R. Shetty, B. Schiele, M. Fritz, A4NT: author attribute anonymity by adversarial training of neural machine translation, in 27th USENIX Security Symposium. USENIX Association (2018), pp. 1633–1650
S. Mechti, M. Jaoua, L.H. Belguith, R. Faiz, Author profiling using style-based features, in Proceedings of CLEF (2013)
D. Nazareth, K. Asnani, O. Rodrigues, Author-profile system development based on software reuse of open source components, in Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014 (Springer, Cham, 2015), pp. 629–636
A.M. Ciobanu, M. Zampieri, S. Malmasi, L.P. Dinu, Including dialects and language varieties in author profiling (2017). arXiv:1707.00621
F. Rangel, P. Rosso, M. Potthast, B. Stein, Overview of the 5th author profiling task at PAN 2017: gender and language variety identification in twitter, in CLEF 2017 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings, ed L. Cappellato, N. Ferro, L. Goeuriot, T. Mandl, vol. 1866 (CEUR-WS.org, 2017)
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, C.-J. Lin, LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
R. Akerkar, P.S. Sajja, Basic learning algorithms, in Intelligent Techniques for Data Science (Springer, Cham, 2016), pp. 53–93
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mamgain, S., Balabantaray, R.C., Das, A.K., Kumar, S. (2020). Author Profiling: Predicting Gender from Document. In: Borah, S., Emilia Balas, V., Polkowski, Z. (eds) Advances in Data Science and Management. Lecture Notes on Data Engineering and Communications Technologies, vol 37. Springer, Singapore. https://doi.org/10.1007/978-981-15-0978-0_10
Download citation
DOI: https://doi.org/10.1007/978-981-15-0978-0_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0977-3
Online ISBN: 978-981-15-0978-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)