Inferring Users’ Gender from Interests: A Tag Embedding Approach

  • Peisong Zhu
  • Tieyun QianEmail author
  • Ming Zhong
  • Xuhui Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9950)


This paper studies the problem of gender prediction of users in social media using their interest tags. The challenge is that the tag feature vector is extremely sparse and short, i.e., less than 10 tags for each user. We present a novel conceptual class based method which enriches and centralizes the feature space. We first identify the discriminating tags based on the tag distribution. We then build the initial conceptual class by taking the advantage of the generalization and specification operations on these tags. For example, “Kobe” is a specialized instance of “basketball”. Finally, we model class expansion as a problem of computing the similarity between one tag and a set of tags in one conceptual class in the embedding space.

We conduct extensive experiments on a real dataset from Sina Weibo. Results demonstrate that our proposed method significantly enhances the quality of the feature space and improves the performance of gender classification. Its accuracy reaches 82.25 % while that for the original tag vector is only 62.75 %.


Gender classification Users’s interests Conceptual class 



The work described in this paper has been supported in part by the NSFC Projects (61272275, 61572376, 61272110), the Wuhan Science and Technology Bureau “Chenguang Jihua” (2014072704011250).


  1. 1.
    Alowibdi, J.S., Buy, U.A., Yu, P.: Empirical evaluation of profile characteristics for gender classification on twitter. In: Proceedings of ICMLA, pp. 365–369 (2013)Google Scholar
  2. 2.
    Bamman, D., Eisenstein, J., Schnoebelen, T.: Gender identity and lexical variation in social media. J. Sociolinguistics 18, 135–160 (2014)CrossRefGoogle Scholar
  3. 3.
    Bergsma, S., Durme, B.V.: Using conceptual class attributes to characterize social media users. In: Proceedings of ACL, pp. 710–720 (2013)Google Scholar
  4. 4.
    Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: Proceedings of EMNLP, pp. 1301–1309 (2011)Google Scholar
  5. 5.
    Cheng, N., Chen, X., Chandramouli, R., Subbalakshmi, K.P.: Gender identification from e-mails. In: Proceedings of CIDM, pp. 154–158 (2009)Google Scholar
  6. 6.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)zbMATHGoogle Scholar
  7. 7.
    Filippova, K.: User demographics and language in an implicit social network. In: Proceedings of EMNLP-CoNLL, pp. 1478–1488 (2012)Google Scholar
  8. 8.
    Garera, N., Yarowsky, D.: Modeling latent biographic attributes in conversational genres. In: Proceedings of ACL and IJCNLP, pp. 710–718 (2009)Google Scholar
  9. 9.
    Goswami, S., Sarkar, S., Rustagi, M.: Stylometric analysis of bloggers age and gender. In: Proceedings of ICWSM, pp. 214–217 (2009)Google Scholar
  10. 10.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean., J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013)Google Scholar
  11. 11.
    Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of EMNLP, pp. 207–217 (2010)Google Scholar
  12. 12.
    Peersman, C., Daelemans, W., Vaerenbergh, L.V.: Predicting age and gender in online social networks. In: Proceedings of SMUC, pp. 37–44 (2011)Google Scholar
  13. 13.
    Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proceedings of SMUC, pp. 37–44 (2010)Google Scholar
  14. 14.
    Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: Proceedings of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, pp. 199–205 (2005)Google Scholar
  15. 15.
    Sun, X., Xiao, Y., Wang, H., Wang, W.: On conceptual labeling of a bag of words. In: Proceedings of IJCAI, pp. 1326–1332 (2015)Google Scholar
  16. 16.
    Tang, C., Ross, K., Saxena, N., Chen, R.: What’s in a name: a study of names, gender inference, and gender behavior in facebook. In: Proceedings of SNSMW (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Peisong Zhu
    • 1
  • Tieyun Qian
    • 1
    Email author
  • Ming Zhong
    • 1
  • Xuhui Li
    • 2
  1. 1.State Key Laboratory of Software EngineeringWuhan UniversityWuhanChina
  2. 2.School of Information ManagementWuhan UniversityWuhanChina

Personalised recommendations