Inference of node attributes from social network assortativity

  • Dounia MuldersEmail author
  • Cyril de Bodt
  • Johannes Bjelland
  • Alex Pentland
  • Michel Verleysen
  • Yves-Alexandre de Montjoye
WSOM 2017


Social networks are known to be assortative with respect to many attributes, such as age, weight, wealth, level of education, ethnicity and gender: Similar people according to these attributes tend to be more connected. This can be explained by influences and homophily. Independently of its origin, this assortativity gives us information about each node given its neighbors. Assortativity can thus be used to improve individual predictions in a broad range of situations, when data are missing or inaccurate. This paper presents a general framework based on probabilistic graphical models to exploit social network structures for improving individual predictions of node attributes. Using this framework, we quantify the assortativity range leading to an accuracy gain in several situations, with various individual prediction profiles. We finally show how specific characteristics of the network can enhance performances further. For instance, the gender assortativity in real-world mobile phone data drastically changes according to some communication attributes. In this case, using the network topology indeed improves local predictions of node labels and moreover enables inferring missing node labels based on a subset of known vertices. In both cases, the performances of the proposed method are statistically significantly superior to the ones achieved by state-of-the-art label propagation and feature extraction schemes in most settings.


Loopy belief propagation Assortativity Homophily Social networks Mobile phone metadata 



DM and CdB are Research Fellows of the Fonds de la Recherche Scientifique - FNRS. The authors gratefully acknowledge Pål Roe Sundsøy for his help with the data.

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.


  1. 1.
    Al Zamal F, Liu W, Ruths D (2012) Homophily and latent attribute inference: inferring latent attributes of twitter users from neighbors. In: ICWSM, vol. 270Google Scholar
  2. 2.
    Aral S, Muchnik L, Sundararajan A (2009) Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proc Natl Acad Sci 106(51):21544–21549CrossRefGoogle Scholar
  3. 3.
    Baluja S, Seth R, Sivakumar D, Jing Y, Yagnik J, Kumar S, Ravichandran D, Aly M (2008) Video suggestion and discovery for youtube: taking random walks through the view graph. In: Proceedings of the 17th international conference on World Wide Web. ACM, London, pp 895–904Google Scholar
  4. 4.
    Bengtsson L, Lu X, Thorson A, Garfield R, Von Schreeb J (2011) Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: a post-earthquake geospatial study in Haiti. PLoS Med 8(8):e1001083CrossRefGoogle Scholar
  5. 5.
    Bhagat S, Cormode G, Muthukrishnan S (2011) Node classification in social networks. In: Aggarwal C (ed) Social network data analytics. Springer, Boston, pp 115–148CrossRefGoogle Scholar
  6. 6.
    Blondel VD, Decuyper A, Krings G (2015) A survey of results on mobile phone datasets analysis. EPJ Data Sci 4(1):10CrossRefGoogle Scholar
  7. 7.
    Blumenstock J, Cadamuro G, On R (2015) Predicting poverty and wealth from mobile phone metadata. Science 350(6264):1073–1076CrossRefGoogle Scholar
  8. 8.
    Castellano C, Fortunato S, Loreto V (2009) Statistical physics of social dynamics. Rev Mod Phys 81(2):591CrossRefGoogle Scholar
  9. 9.
    Devroye L (1996) Random variate generation in one line of code. In: Simulation conference, 1996. Proceedings, Winter. IEEE, Washington, pp 265–272Google Scholar
  10. 10.
    Dong Y, Yang Y, Tang J, Yang Y, Chawla NV (2014) Inferring user demographics and social strategies in mobile social networks. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, London, pp 15–24Google Scholar
  11. 11.
    Felbo B, Sundsøy P, Lehmann S, de Montjoye YA et al (2017) Modeling the temporal nature of human behavior for demographics prediction. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Berlin, pp 140–152Google Scholar
  12. 12.
    Frias-Martinez V, Frias-Martinez E, Oliver N (2010) A gender-centric analysis of calling behavior in a developing economy using call detail records. In: AAAI spring symposium: artificial intelligence for developmentGoogle Scholar
  13. 13.
    Ghahramani Z (2002) Graphical models: parameter learning. Handb Brain Theory Neural Netw 2:486–490Google Scholar
  14. 14.
    Goyal P, Ferrara E (2018) Graph embedding techniques, applications, and performance: A survey. Knowl Based Syst 151:78–94CrossRefGoogle Scholar
  15. 15.
    Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, London, pp 855–864Google Scholar
  16. 16.
    Herrera-Yagüe C, Zufiria PJ (2012) Prediction of telephone user attributes based on network neighborhood information. In: International workshop on machine learning and data mining in pattern recognition. Springer, Berlin, pp 645–659Google Scholar
  17. 17.
    Jahani E, Sundsøy P, Bjelland J, Bengtsson L, de Montjoye YA et al (2017) Improving official statistics in emerging markets using machine learning and mobile phone data. EPJ Data Sci 6(1):3CrossRefGoogle Scholar
  18. 18.
    Jordan MI et al (2004) Graphical models. Stat Sci 19(1):140–155MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Kokkos A, Tzouramanis T (2014) A robust gender inference model for online social networks and its application to Linkedin and Twitter. First Monday 19(9):8CrossRefGoogle Scholar
  20. 20.
    Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, CambridgezbMATHGoogle Scholar
  21. 21.
    Liu W, Ruths D (2013) What’s in a name? using first names as features for gender inference in twitter. In: AAAI spring symposium: analyzing microtext, vol 13, p 01Google Scholar
  22. 22.
    Madan A, Moturu ST, Lazer D, Pentland AS (2010) Social sensing: obesity, unhealthy eating and exercise in face-to-face networks. In: Wireless health 2010. ACM, London, pp 104–110Google Scholar
  23. 23.
    Magno G, Weber I (2014) International gender differences and gaps in online social networks. In: International conference on social informatics. Springer, Berlin, pp 121–138Google Scholar
  24. 24.
    McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Annu Rev Sociol 27:415–444CrossRefGoogle Scholar
  25. 25.
    de Montjoye YA, Kendall J, Kerry CF (2014) Enabling humanitarian use of mobile phone data. Brookings Center for Technology and Innovation, WashingtonGoogle Scholar
  26. 26.
    de Montjoye YA, Quoidbach J, Robic F, Pentland AS (2013) Predicting personality using novel mobile phone-based metrics. In: Greenberg AM, Kennedy WG, Bos ND (eds) Social computing, behavioral-cultural modeling and prediction. Springer, Berlin, pp 48–55CrossRefGoogle Scholar
  27. 27.
    de Montjoye YA, Rocher L, Pentland AS (2016) Bandicoot: a python toolbox for mobile phone metadata. J Mach Learn Res 17(175):1–5MathSciNetGoogle Scholar
  28. 28.
    Montoliu R, Gatica-Perez D (2010) Discovering human places of interest from multimodal mobile phone data. In: Proceedings of the 9th international conference on mobile and ubiquitous multimedia. ACM, London, p 12Google Scholar
  29. 29.
    Murphy KP, Weiss Y, Jordan MI (1999) Loopy belief propagation for approximate inference: An empirical study. In: Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 467–475Google Scholar
  30. 30.
    Newman ME (2000) Models of the small world. J Stat Phys 101(3–4):819–841CrossRefzbMATHGoogle Scholar
  31. 31.
    Newman ME (2003) Mixing patterns in networks. Phys Rev E 67(2):026126MathSciNetCrossRefGoogle Scholar
  32. 32.
    Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Orman GK, Labatut V (2009) A comparison of community detection algorithms on artificial networks. In: International conference on discovery science. Springer, Berlin, pp 242–256Google Scholar
  34. 34.
    Palchykov V, Kaski K, Kertész J, Barabási AL, Dunbar RI (2012) Sex differences in intimate relationships. Sci Rep 2:370CrossRefGoogle Scholar
  35. 35.
    Peersman C, Daelemans W, Van Vaerenbergh L (2011) Predicting age and gender in online social networks. In: Proceedings of the 3rd international workshop on search and mining user-generated contents. ACM, London, pp 37–44Google Scholar
  36. 36.
    Rosenquist JN, Murabito J, Fowler JH, Christakis NA (2010) The spread of alcohol consumption behavior in a large social network. Ann Intern Med 152(7):426–433CrossRefGoogle Scholar
  37. 37.
    Sarraute C, Blanc P, Burroni J (2014) A study of age and gender seen through mobile phone usage patterns in Mexico. In: 2014 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, Washington, pp 836–843Google Scholar
  38. 38.
    Sarraute C, Brea J, Burroni J, Blanc P (2015) Inference of demographic attributes based on mobile phone usage patterns and social network topology. Soc Netw Anal Min 5(1):39CrossRefGoogle Scholar
  39. 39.
    Šćepanović S, Mishkovski I, Hui P, Nurminen JK, Ylä-Jääski A (2015) Mobile phone call data as a regional socio-economic proxy indicator. PLoS ONE 10(4):e0124160CrossRefGoogle Scholar
  40. 40.
    Sen P, Namata G, Bilgic M, Getoor L, Galligher B, Eliassi-Rad T (2008) Collective classification in network data. AI Mag 29(3):93CrossRefGoogle Scholar
  41. 41.
    Shaffer JP (1995) Multiple hypothesis testing. Annu Rev Psychol 46(1):561–584CrossRefGoogle Scholar
  42. 42.
    Smith JA, McPherson M, Smith-Lovin L (2014) Social distance in the united states: Sex, race, religion, age, and education homophily among confidants, 1985 to 2004. Am Sociol Rev 79(3):432–456CrossRefGoogle Scholar
  43. 43.
    Sundsøy P, Bjelland J, Reme B, Iqbal A, Jahani E (2016) Deep learning applied to mobile phone data for individual income classification. In: ICAITA 2016 international conference on artificial intelligence and applicationsGoogle Scholar
  44. 44.
    Tang J, Lou T, Kleinberg J (2012) Inferring social ties across heterogenous networks. In: Proceedings of the fifth ACM international conference on web search and data mining. ACM, London, pp 743–752Google Scholar
  45. 45.
    Tatem AJ, Qiu Y, Smith DL, Sabot O, Ali AS, Moonen B et al (2009) The use of mobile phone data for the estimation of the travel patterns and imported plasmodium falciparum rates among Zanzibar residents. Malar J 8(1):10–1186CrossRefGoogle Scholar
  46. 46.
    Traud AL, Mucha PJ, Porter MA (2012) Social structure of Facebook networks. Phys A Stat Mech Appl 391(16):4165–4180CrossRefGoogle Scholar
  47. 47.
    Wainwright MJ, Jordan MI (2008) Graphical models, exponential families, and variational inference. Found Trends Mach Learn 1(1–2):1–305. zbMATHGoogle Scholar
  48. 48.
    Wang Y, Zang H, Faloutsos M (2013) Inferring cellular user demographic information using homophily on call graphs. In: INFOCOM, 2013 Proceedings IEEE. IEEE, Washington, pp 3363–3368Google Scholar
  49. 49.
    Weiss Y, Freeman WT (2001) On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs. IEEE Trans Inf Theory 47(2):736–744MathSciNetCrossRefzbMATHGoogle Scholar
  50. 50.
    Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2003) Learning with local and global consistency. NIPS 16:321–328Google Scholar
  51. 51.
    Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp 912–919Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.ICTEAM InstituteUniversité catholique de LouvainOttignies-Louvain-la-NeuveBelgium
  2. 2.Telenor ResearchFornebuNorway
  3. 3.MIT Media LabMassachusetts Institute of TechnologyCambridgeUSA
  4. 4.Department of Computing, Data Science InstituteImperial College LondonLondonUK

Personalised recommendations