Author Profiling in Social Media: The Impact of Emotions on Discourse Analysis

  • Paolo RossoEmail author
  • Francisco Rangel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10583)


In this paper we summarise the content of the keynote that will be given at the 5th International Conference on Statistical Language and Speech Processing (SLSP) in Le Mans, France in October 23–25, 2017. In the keynote we will address the importance of inferring demographic information for marketing and security reasons. The aim is to model how language is shared in gender and age groups taking into account its statistical usage. We will see how a shallow discourse analysis can be done on the basis of a graph-based representation in order to extract information such as how complicated the discourse is (i.e., how connected the graph is), how much interconnected grammatical categories are, how far a grammatical category is from others, how different grammatical categories are related to each other, how the discourse is modelled in different structural or stylistic units, what are the grammatical categories with the most central use in the discourse of a demographic group, what are the most common connectors in the linguistic structures used, etc. Moreover, we will see also the importance to consider emotions in the shallow discourse analysis and the impact that this has. We carried out some experiments for identifying gender and age, both in Spanish and in English, using PAN-AP-13 and PAN-PC-14 corpora, obtaining comparable results to the best performing systems of the PAN Lab at CLEF.


Author profiling Graph-based representation Shallow discourse analysis EmoGraph 



We thank the SLSP Conference for the invitation for giving the keynote on Author Profiling in Social Media. The research work described in this paper was partially carried out in the framework of the SomEMBED project (TIN2015-71147-C2-1-P), funded by the Spanish Ministry of Economy, Industry and Competitiveness (MINECO).


  1. 1.
    Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008(10), 10008 (2008)CrossRefGoogle Scholar
  2. 2.
    Bonacich, P.: Factoring and weighting approaches to clique identification. J. Math. Soc. 2(1), 113–120 (1972)CrossRefGoogle Scholar
  3. 3.
    Brandes, U.: A faster algorithm for betweenness centrality. J. Math. Soc. 25(2), 163–177 (2001)CrossRefzbMATHGoogle Scholar
  4. 4.
    Carreras, X., Chao, I., Padró, L., Padró, M.: FreeLing : an open-source suite of language analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004) (2004)Google Scholar
  5. 5.
    Díaz Rangel, I., Sidorov, G., Suárez-Guerra, S.: Creación y evaluación de un diccionario marcado con emociones y ponderado para el español. Onomazein 29, 23 (2014). (in Spanish)Google Scholar
  6. 6.
    Ekman, P.: Universals and cultural differences in facial expressions of emotion. In: Symposium on Motivation, Nebraska, pp. 207–283 (1972)Google Scholar
  7. 7.
    Forner, P., Navigli, R., Tufis, D. (eds.): CLEF 2013 Evaluation Labs and Workshop, Working Notes Papers, September 2013, Valencia, Spain, vol. 1179, pp. 23–26. (2013)Google Scholar
  8. 8.
    Koppel, M., Argamon, S., Shimoni, A.: Automatically categorizing written texts by author gender. Literay Linguist. Comput. 17(4), 401–412 (2003)CrossRefGoogle Scholar
  9. 9.
    Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. (TCS) 407(1–3), 458–473 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Levin, B.: English Verb Classes and Alternations. University of Chicago Press, Chicago (1993)Google Scholar
  11. 11.
    Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text-Interdiscip. J. Study Discourse 8(3), 243–281 (1988)CrossRefGoogle Scholar
  12. 12.
    Meina, M., Brodzinska, K., Celmer, B., Czokow, M., Patera, M., Pezacki, J., Wilk, M.: Ensemble-based classification for author profiling using various features notebook for PAN at CLEF 2013. In: Forner et al. [7]Google Scholar
  13. 13.
    Padró, L., Stanilovsky, E.: FreeLing 3.0: towards wider multilinguality. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2012) (2012)Google Scholar
  14. 14.
    Lopez-Monroy, A.P., Montes-Gomez, M., Jair Escalante, H., Villasenor-Pineda, L., Villatoro-Tello, E.: INAOEs participation at PAN13: author profiling task. Notebook for PAN at CLEF 2013. In: Forner et al. [7]Google Scholar
  15. 15.
    Pennebaker, J.W., Mehl, M.R., Niederhoffer, K.: Psychological aspects of natural language use: our words, our selves. Annu. Rev. Psychol. 54, 547–577 (2003)CrossRefGoogle Scholar
  16. 16.
    Pennebaker, J.W.: The Secret Life of Pronouns: What Our Words Say About Us. Bloomsbury Press, London (2011)Google Scholar
  17. 17.
    Rangel, F., Hernández, I., Rosso, P., Reyes, A.: Emotions and irony per gender in Facebook. In: Proceedings of the Workshop on Emotion, Social Signals, Sentiment & Linked Open Data (ES3LOD), LREC-2014, Reykjavik, Iceland, 26–31 May 2014, pp. 68–73 (2014)Google Scholar
  18. 18.
    Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. In: Forner et al. [7]Google Scholar
  19. 19.
    Rangel, F., Rosso, P., Chugur, I., Potthast, M., Trenkmann, M., Stein, B., Verhoeven, B., Daelemans, W.: Overview of the 2nd author profiling task at PAN 2014. In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) Notebook Papers of CLEF 2014 LABs and Workshops, vol. 1180, pp. 951–957. (2014)Google Scholar
  20. 20.
    Rangel, F., Rosso, P.: On the multilingual and genre robustness of EmoGraphs for author profiling in social media. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 274–280. Springer, Cham (2015). doi: 10.1007/978-3-319-24027-5_28 CrossRefGoogle Scholar
  21. 21.
    Rangel, F., Rosso, P.: On the impact of emotions on author profiling. Inf. Process. Manag. 52(1), 73–92 (2016)CrossRefGoogle Scholar
  22. 22.
    Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, AAAI, pp. 199–205 (2006)Google Scholar
  23. 23.
    Soler-Company, J. Wanner, L.: Use of discourse and syntactic features for gender identification. In: The Eighth Starting Artificial Intelligence Research Symposium. Collocated with the 22nd European Conference on Artificial Intelligence, pp. 215–220 (2016)Google Scholar
  24. 24.
    Soler-Company, J., Wanner, L.: On the relevance of syntactic and discourse features for author profiling and identification. In: 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Valencia, Spain, pp. 681–687 (2017)Google Scholar
  25. 25.
    Strapparava, C., Valitutti, A.: WordNet affect: an affective extension of WordNet. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisboa, pp. 1083–1086 (2004)Google Scholar
  26. 26.
    Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393(6684), 409–410 (1998)CrossRefzbMATHGoogle Scholar
  27. 27.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML, pp. 412–420 (1997)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.PRHLT Research CenterUniversitat Politècnica de ValènciaValenciaSpain
  2. 2.Autoritas ConsultingValenciaSpain

Personalised recommendations