Skip to main content

Say It with Colors: Language-Independent Gender Classification on Twitter

  • Chapter
  • First Online:
Online Social Media Analysis and Visualization

Part of the book series: Lecture Notes in Social Networks ((LNSN))

Abstract

Online Social Networks (OSNs) have spread at stunning speed over the past decade. They are now a part of the lives of dozens of millions of people. The onset of OSNs has stretched the traditional notion of community to include groups of people who have never met in person but communicate with each other through OSNs to share knowledge, opinions, interests and activities. Here we explore in depth language independent gender classification. Our approach predicts gender using five color-based features extracted from Twitter profiles such as the background color in a user’s profile page. This is in contrast with most existing methods for gender prediction that are language dependent. Those methods use high-dimensional spaces consisting of unique words extracted from such text fields as postings, user names, and profile descriptions. Our approach is independent of the user’s language, efficient, scalable, and computationally tractable, while attaining a good level of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mocanu D, Baronchelli A, Perra N, Gonçalves B, Zhang Q, Vespignani A (2013) The Twitter of Babel: mapping world languages through microblogging platforms. PLoS One 8(4):e61981

    Article  Google Scholar 

  2. Wauters R, Only 50% of Twitter messages are in English, study says. http://techcrunch.com/2010/02/24/twitter-languages/

  3. Burger JD, Henderson J, Kim G, Zarrella G (2011) Discriminating gender on Twitter. In: Proceedings of the 2011 conference on empirical methods in natural language processing. Edinburgh, Scotland, UK. Association for Computational Linguistics, July 2011, pp 1301–1309. [Online] http://www.aclweb.org/anthology/D11-1120

  4. Al Zamal F, Liu W, Ruths D (2012) Homophily and latent attribute inference: Inferring latent attributes of Twitter users from neighbors. In: 6th international AAAI conference on weblogs and social media (ICWSM’12), 2012

    Google Scholar 

  5. Liu W, Al Zamal F, Ruths D (2012) Using social media to infer gender composition of commuter populations. In: Proceedings of the when the city meets the citizen workshop, the international conference on weblogs and social media

    Google Scholar 

  6. Rao D, Yarowsky D, Shreevats A, Gupta M (2010) Classifying latent user attributes in Twitter. In: Proceedings of the 2nd international workshop on search and mining user-generated contents, pp 37–44

    Google Scholar 

  7. Liu W, Ruths D (2013) What’s in a name? Using first names as features for gender inference in Twitter. In: 2013 AAAI spring symposium series, in symposium on analyzing microtext

    Google Scholar 

  8. Alowibdi J, Buy U, Yu P (2013) Empirical evaluation of profile characteristics gender classification on Twitter. In: The 12th international conference on machine learning and applications (ICMLA), vol 1, pp 365–369, December 2013

    Google Scholar 

  9. Alowibdi J, Buy U, Yu P (2013) Language independent gender classification on Twitter. In: IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM’13, pp 739–743, August 2013

    Google Scholar 

  10. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10–18

    Article  Google Scholar 

  11. Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Ohl P, Thiel K, Wiswedel B (2009) Knime-the konstanz information miner: version 2.0 and beyond. ACM SIGKDD Explor Newsl 11(1):26–31

    Article  Google Scholar 

  12. Singh S (2001) A pilot study on gender differences in conversational speech on lexical richness measures. Lit Linguist Comput 16(3):251–264

    Article  Google Scholar 

  13. Argamon S, Koppel M, Fine J, Shimoni AR (2003) Gender, genre, and writing style in formal written texts. Text 23(3):321–346

    Google Scholar 

  14. Koppel M, Argamon S, Shimoni AR (2002) Automatically categorizing written texts by author gender. Lit Linguist Comput 17(4):401–412

    Article  Google Scholar 

  15. Sarawgi R, Gajulapalli K, Choi Y (2011) Gender attribution: tracing stylometric evidence beyond topic and genre. In: Proceedings of the fifteenth conference on computational natural language learning, Portland, OR, pp 78–86, June 2011

    Google Scholar 

  16. Nowson S, Oberlander J, Gill A (2005) Weblogs, genres and individual differences. In: Proceedings of the 27th annual meeting of the cognitive science society, Stresa, Italy, pp 1666–1671

    Google Scholar 

  17. Kucukyilmaz T, Cambazoglu BB, Aykanat C, Can F (2006) Chat mining for gender prediction. Advances in information systems. Springer, Berlin, pp 274–283

    Chapter  Google Scholar 

  18. Mukherjee A, Liu B (2010) Improving gender classification of blog authors. In: Proceedings of the 2010 conference on empirical methods in natural language, processing. Association for Computational Linguistics, Cambridge, MA, pp 207–217, October 2010. [online]. http://www.aclweb.org/anthology/D10-1021

  19. Peersman C, Daelemans W, Van Vaerenbergh L (2011) Predicting age and gender in online social networks. In: Proceedings of the 3rd international workshop on search and mining user-generated contents, pp 37–44

    Google Scholar 

  20. Herring SC, Paolillo JC (2006) Gender and genre variation in weblogs. J Socioling 10(4):439–459

    Article  Google Scholar 

  21. Brain S, Twitter statistics. http://www.statisticbrain.com/twitter-statistics

  22. Business T, Who is on Twitter? https://business.twitter.com/whos-twitter

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ugo A. Buy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Alowibdi, J.S., Buy, U.A., Yu, P.S. (2014). Say It with Colors: Language-Independent Gender Classification on Twitter. In: Kawash, J. (eds) Online Social Media Analysis and Visualization. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-13590-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13590-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13589-2

  • Online ISBN: 978-3-319-13590-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics