Abstract
In this paper we try for the first time to shed light on the use of Twitter by the Italian speaking users quantifying the total audience and some relevant characteristics: in particular, gender and location. The attempt is based on publicly available APIs data referring both to profile documents and tweets. Through real-time calculation is possible to infer the gender mainly using the name field of the users’ profile, while the geo-location is deduced using the location field and the geotagged tweets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
According to Alexa.
- 2.
- 3.
For further information see [22] and the references therein.
- 4.
- 5.
- 6.
- 7.
The enterprises whose websites were scraped in the cited study, were the majority (64%) of the enterprises (with 10 employees and over) having a website, but only the half of these enterprises presented links to social media.
- 8.
For example, consider a company named “rossi” and the username “alexRossi”. The username contains the company name but the remaining letters can be interpreted as a male proper name and hence the username is not labelled as a company.
- 9.
i.e. the Italian National Institute of Statistics list of municipalities, containing 7978 Italian municipalities.
- 10.
We tried also to determine the users profession using the bio field, through a list of roughly 1000 professions. Results were absolutely not satisfactory maybe because the bio field is an open field that each user interprets in her own way.
References
Barcaroli, G., Bianchi, G., Nurra, A.: Internet as a data source: Ict use of enterprises: web ordering, job advertising and presence on social media. In: Big Data Committee Annual Report 2017, ISTAT, CIKM ’10. https://www.istat.it/it/files//2018/09/Big-data-committee.pdf (2018)
Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. http://arxiv.org/abs/1010.3003 (2010)
Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, pp. 1301–1309, Stroudsburg, PA, USA. Association for Computational Linguistics. ISBN 978-1-937284-11-4. http://dl.acm.org/citation.cfm?id=2145432.2145568 (2011)
Censis. 13\(^{\circ }\) rapporto censis-ucsi sulla comunicazione i media tra élite e popolo. http://www.censis.it/17?shadow_pubblicazione=120570 (2016)
Chang, J., Rosenn, I., Backstrom, L., Marlow,C.: Epluribus: Ethnicity on social networks. In: ICWSM (2010)
Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating twitter users. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, New York, NY, USA, pp. 759–768. ACM. ISBN 978-1-4503-0099-5. https://doi.org/10.1145/1871437.1871535 (2010)
Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Who is tweeting on twitter: human, bot, or cyborg? In: Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC ’10, New York, NY, USA, pp. 21–30. ACM. ISBN 978-1-4503-0133-6. https://doi.org/10.1145/1920261.1920265 (2010)
Culotta, A., Ravi, N.K., Cutler, J: Predicting the demographics of twitter users from website traffic data. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, pp. 72–78. AAAI Press. ISBN 0-262-51129-0. http://dl.acm.org/citation.cfm?id=2887007.2887018 (2015)
Daas, P.J., Burger, J., Le, Q., ten Bosch, O., Puts, M.J.: Profiling of Twitter Users: A Big Data Selectivity Study (2016)
Della Ratta, F., Pontecorvo, M.E., Vaccari, C., Virgillito, A.: Big data and textual analysis: a corpus selection from twitter. Rome between the fear of terrorism and the jubilee. https://www.researchgate.net/publication/303843023_Big_data_and_textual_analysis_a_corpus_selection_from_Twitter_Rome_between_the_fear_of_terrorism_and_the_Jubilee (2016)
Gurajala, S., White, J.S., Hudson, B., Matthews, J.N.: Fake twitter accounts: profile characteristics obtained using an activity-based pattern detection approach. In: Proceedings of the 2015 International Conference on Social Media & Society, SMSociety ’15, New York, NY, USA, pp. 9:1–9:7. ACM. ISBN 978-1-4503-3923-0. https://doi.org/10.1145/2789187.2789206 (2015)
Huang, W., Weber, I., Vieweg, S.: Inferring nationalities of twitter users and studying inter-national linking. In: Proceedings of the 25th ACM Conference on Hypertext and Social Media, HT ’14, New York, NY, USA, pp. 237–242. ACM. ISBN 978-1-4503-2954-5. https://doi.org/10.1145/2631775.2631825 (2014)
ICTGlobus. Social media in italia: analisi dei flussi di utilizzo del 2016. https://www.ictglobus.com/social-media-in-italia-analisi-dei-flussi-di-utilizzo-del-2016/ (2017)
Ikeda, K., Hattori, G., Matsumoto, K., Ono, C., Higashino, T.: Demographic estimation of twitter users for marketing analysis. IPSJ Trans. Consum. Devices Syst. 2(1), 82–93 (2012)
Ikeda, K., Hattori, G., Ono, C., Asoh, H., Higashino, T.: Twitter user profiling based on text and community mining for market analysis. Knowl.-Based Syst. 51(1), 35–47. ISSN 0950-7051. https://doi.org/10.1016/j.knosys.2013.06.020 (2013)
Ito, J., Nishida, K., Hoshide, T., Toda, H., Uchiyama, T.: Demographic and psychographic estimation of twitter users using social structures, pp. 27–46. Springer International Publishing, Cham (2014). ISBN 978-3-319-13590-8. https://doi.org/10.1007/978-3-319-13590-8_2
Lee, K., Caverlee, J., Webb, S.: Uncovering social spammers: Social honeypots + machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’10, pp. 435–442, New York, NY, USA. ACM (2010). ISBN 978-1-4503-0153-4. https://doi.org/10.1145/1835449.1835522
Liu, W., Ruths, D.: What’s in a name? using first names as features for gender inference in twitter. In: AAAI spring symposium: Analyzing microtext, vol. 13, p. 01 (2013)
Mislove, A., Jørgensen, S., Ahn, Y.-Y., Onnela, J.-P., Rosenquist, J.: Understanding the demographics of twitter users, pp. 554–557. AAAI Press (2011). ISBN 978-1-57735-505-2
Mohammady, E., Culotta, A.: Using county demographics to infer attributes of twitter users. ACL 2014, 7 (2014)
Nguyen, D., Smith, N.A., Rosé, C.P.: Author age prediction from text using linear regression. In: Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, LaTeCH ’11, pp. 115–123, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics. ISBN 9781937284046. http://dl.acm.org/citation.cfm?id=2107636.2107651
Paquet-Clouston, M., Bilodeau, O., Décary-Hétu, D.: Can we trust social media data?: Social network manipulation by an iot botnet. In: Proceedings of the 8th International Conference on Social Media & Society, #SMSociety17, pp. 15:1–15:9, New York, NY, USA. ACM. ISBN 978-1-4503-4847-8. https://doi.org/10.1145/3097286.3097301
Pennacchiotti, M., Popescu, A.-M.: A machine learning approach to twitter user classification. In: ICWSM (2011)
Preotiuc-Pietro, D., Volkova, S., Lampos,V., Bachrach, Y., Aletras, N.: Studying user income through language, behaviour and affect in social media. PLOS One 10(9), 1–17 (2015). https://doi.org/10.1371/journal.pone.0138717
Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in twitter. In: Proceedings of the 2Nd International Workshop on Search and Mining User-generated Contents, SMUC ’10, pp. 37–44, New York, NY, USA. ACM. ISBN 978-1-4503-0386-6. https://doi.org/10.1145/1871985.1871993 (2010)
Rao, D., Paul, M.J., Fink, C., Yarowsky, D., Oates, T., Coppersmith, G.: Hierarchical bayesian models for latent attribute detection in social media. In: Adamic, L.A., Baeza-Yates, R.A., Counts, S. (eds.) ICWSM. The AAAI Press. http://dblp.uni-trier.de/db/conf/icwsm/icwsm2011.html#RaoPFYOC11 (2011)
Sakaki, S., Miura, Y., Ma, X., Hattori, K., Ohkuma, T.: Twitter user gender inference using combined analysis of text and image processing. V&L Net 2014, 54 (2014)
Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Lucas, R.E., Agrawal, M., Park, G.J., Lakshmikanth, S.K., Jha, S., Seligman, M.E. et al.: Characterizing geographic variation in well-being using tweets. In: ICWSM (2013)
Sloan, L.: Who tweets in the united kingdom? Profiling the twitter population using the british social attitudes survey 2015. Social Media + Society, 3(1), 2056305117698981 (2017). https://doi.org/10.1177/2056305117698981
Sloan, L., Morgan, J.: Who tweets with their location? understanding the relationship between demographic characteristics and the use of geoservices and geotagging on twitter. PLOS One 10(11), 1–15 (2015). https://doi.org/10.1371/journal.pone.0142209
Sloan, L., Morgan, J., Burnap, P., Williams, M.: Who tweets? deriving the demographic characteristics of age, occupation and social class from twitter user meta-data. PLOS One 10(3), 1–20 (2015). https://doi.org/10.1371/journal.pone.0115545
Varol, O., Ferrara, E., Davis, C.A., Menczer, F., Flammini, A.: Online human-bot interactions: detection, estimation, and characterization. CoRR abs/1703.03107, http://arxiv.org/abs/1703.03107 (2017)
Zamal, F.A., Liu, W., Ruths, D.: Homophily and latent attribute inference: inferring latent attributes of twitter users from neighbors. In: Breslin, J.G., Ellison, N.B., Shanahan, J.G., Tufekci, Z. (eds.) ICWSM. The AAAI Press. http://dblp.uni-trier.de/db/conf/icwsm/icwsm2012.html#ZamalLR12 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Alessandra, R., Gentile, M.M., Bianco, D.M. (2019). Who Tweets in Italian? Demographic Characteristics of Twitter Users. In: Petrucci, A., Racioppi, F., Verde, R. (eds) New Statistical Developments in Data Science. SIS 2017. Springer Proceedings in Mathematics & Statistics, vol 288. Springer, Cham. https://doi.org/10.1007/978-3-030-21158-5_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-21158-5_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21157-8
Online ISBN: 978-3-030-21158-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)