Advertisement

Age and Gender Classification of Tweets Using Convolutional Neural Networks

  • Roy Khristopher BayotEmail author
  • Teresa Gonçalves
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10710)

Abstract

Determining age and gender from a series of texts is useful for areas such as business intelligence and digital forensics. We explore the use of convolutional neural networks together with word2vec word embeddings for this task in comparison to handcrafted features. The network constructed consists of five layers and is trained using adadelta. It starts with an embedding layer where a word is represented by a vector, followed by a convolutional layer composed of three filters, each with 100 feature maps. It is followed by a max-over-time pooling layer which is done on each map and the resulting features are concatenated before a dropout layer and a softmax layer. The network was trained to classify age and gender for English and Spanish tweets. The predictions per tweet were aggregated using the majority prediction as the final prediction for the user who gave the tweets. The results outperform previous experiments. The highest English age and gender classification accuracy obtained are 49.6% and 72.1% respectively. The highest Spanish age and gender classification accuracy obtained on the other hand are 56.0% and 69.3% respectively.

Keywords

Author profiling Twitter Word vectors Word2vec Convolutional neural networks 

Notes

Acknowledgments

The authors would like to thank FCT, Fundação de Ciências e Tecnologia under LISP research center (UID/CEC/4668/2016) for partially supporting this research.

References

  1. 1.
    Álvarez-Carmona, M.A., López-Monroy, A.P., Montes-y Gómez, M., Villaseñor-Pineda, L., Jair-Escalante, H.: INAOE’s participation at PAN 2015: author profiling task. In: Working Notes Papers of the CLEF 2015 Evaluation Labs (2015)Google Scholar
  2. 2.
    Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Automatically profiling the author of an anonymous text. Commun. ACM 52(2), 119–123 (2009)CrossRefGoogle Scholar
  3. 3.
    Bayot, R., Gonçalves, T.: Author profiling using SVMs and word embedding averages-notebook for PAN at CLEF 2016. In: Balog et al. [22]Google Scholar
  4. 4.
    Chollet, F.: keras, (2015). https://github.com/fchollet/keras
  5. 5.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug), 2493–2537 (2011)zbMATHGoogle Scholar
  6. 6.
    González-Gallardo, C.E., Montes, A., Sierra, G., Antonio Núñez-Juárez, J., Salinas-López, A.J., Ek, J.: Tweets classification using corpus dependent tags, character and POS N-grams. In: Proceedings of CLEF (2015)Google Scholar
  7. 7.
    Halliday, M., Matthiessen, C.M., Matthiessen, C.: An Introduction to Functional Grammar. Routledge, Abingdon (2014)CrossRefGoogle Scholar
  8. 8.
    Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
  9. 9.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  10. 10.
    Li, X., Roth, D.: Learning question classifiers. In: Proceedings of the 19th International Conference on Computational Linguistics-Volume 1, pp. 1–7. Association for Computational Linguistics (2002)Google Scholar
  11. 11.
    Lopez-Monroy, A.P., Montes-y Gomez, M., Escalante, H.J., Villasenor-Pineda, L., Villatoro-Tello, E.: Inaoe’s participation at PAN 2013: author profiling task. In: CLEF 2013 Evaluation Labs and Workshop (2013)Google Scholar
  12. 12.
    López-Monroy, A.P., Montes-y Gómez, M., Escalante, H.J., Villaseñor Pineda, L.: Using intra-profile information for author profiling. In: CLEF (Working Notes), pp. 1116–1120 (2014)Google Scholar
  13. 13.
    Maharjan, S., Shrestha, P., Solorio, T.: A simple approach to author profiling in mapreduce. In: CLEF (Working Notes), pp. 1121–1128 (2014)Google Scholar
  14. 14.
    Marquardt, J., Farnadi, G., Vasudevan, G., Moens, M.-F., Davalos, S., Teredesai, A., De Cock, M.: Age and gender identification in social media. In: Proceedings of CLEF 2014 Evaluation Labs (2014)Google Scholar
  15. 15.
    Meina, M., Brodzinska, K., Celmer, B., Czoków, M., Patera, M., Pezacki, J., Wilk, M.: Ensemble-based classification for author profiling using various features. Notebook Papers of CLEF (2013)Google Scholar
  16. 16.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  17. 17.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  18. 18.
    Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, p. 271. Association for Computational Linguistics (2004)Google Scholar
  19. 19.
    Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)Google Scholar
  20. 20.
    Rangel, F., Rosso, P., Koppel, M.M., Stamatatos, E., Inches, G.: Overview of the author profiling task at pan 2013. In: CLEF Conference on Multilingual and Multimodal Information Access Evaluation, pp. 352–365. CELCT (2013)Google Scholar
  21. 21.
    Rangel, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3nd author profiling task at pan: In: Cappellato, L., Ferro, N., Gareth, J., San Juan, E. (eds) CLEF 2015 Labs and Workshops, Notebook Papers, vol. 1391 (2015)Google Scholar
  22. 22.
    Rangel, F., Rosso, P., Verhoeven, B., Daelemans, W., Pottast, M., Stein, B.: Overview of the 4th author profiling task at PAN 2016. In: Balog, K., Cappellato, L., Ferro, N., Macdonald, C. (eds) Working Notes Papers of the CLEF 2015 Evaluation Labs. CEUR Workshop Proceedings, vol. 1609, pp. 750–784. CLEF and CEUR-WS.org, September 2016Google Scholar
  23. 23.
    Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, pp. 45–50. ELRA, May 2010. http://is.muni.cz/publication/884893/en
  24. 24.
    Santosh, K., Bansal, R., Shekhar, M., Varma, V.: Author profiling: predicting age and gender from blogs. In: Notebook Papers of CLEF (2013)Google Scholar
  25. 25.
    Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, vol. 6, pp. 199–205 (2006)Google Scholar
  26. 26.
    Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: Learning semantic representations using convolutional neural networks for web search. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 373–374. ACM (2014)Google Scholar
  27. 27.
    Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs/1605.02688, May 2016Google Scholar
  28. 28.
    Villena Román, J., González Cristóbal, J.C.: Daedalus at pan 2014: guessing tweet author’s gender and age (2014)Google Scholar
  29. 29.
    Weren, E.R.D., Moreira, V.P., de Oliveira, J.P.M.: Exploring information retrieval features for author profiling. In: CLEF (Working Notes), pp. 1164–1171 (2014)Google Scholar
  30. 30.
    Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Lang. Resour. Eval. 39(2–3), 165–210 (2005)CrossRefGoogle Scholar
  31. 31.
    Yih, W.-T., He, X., Meek, C.: Semantic parsing for single-relation question answering. In: ACL, vol. 2, pp. 643–648. Citeseer (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.LISP - Laboratório de Informática, Sistemas e ParalelismoUniversidade de ÉvoraÉvoraPortugal

Personalised recommendations