Introduction to Social Network Analysis

O’Malley, Alistair James; Onnela, Jukka-Pekka

doi:10.1007/978-1-4939-8715-3_37

Alistair James O’Malley^8,10 &
Jukka-Pekka Onnela⁹

Part of the book series: Health Services Research ((HEALTHSR))

2071 Accesses
2 Citations

Abstract

This chapter introduces statistical methods used in the analysis of social networks and in the rapidly evolving parallel-field of network science. Although several instances of social network analysis in health services research have appeared recently, the majority involve only the most basic methods and thus scratch the surface of what might be accomplished. Cutting-edge methods using relevant examples and illustrations in health services research are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 649.99; Price excludes VAT (USA)

Hardcover Book: USD 899.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Airoldi EM, Fienberg SE, Xing EP. Mixed membership stochastic blockmodels. J Mach Learn Res. 2008;9:1981–2014.
PubMed PubMed Central Google Scholar
Anselin L. Spatial econometrics: methods and models. Dordrecht: Kluwer; 1988.
Book Google Scholar
Barabasi A-L, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–12. http://www.sciencemag.org/content/286/5439/509.abstract
Barabasi A-L, Albert R, Jeong H. Mean-field theory for scale-free random networks. Phys A Stat Mech Appl. 1999;272:173–87. http://www.sciencedirect.com/science/article/pii/S0378437199002915.
Article Google Scholar
Barnett ML, Landon BE, O’Malley AJ, Keating NL, Christakis NA. Mapping physician networks with self-reported and administrative data. Health Serv Res. 2011;46:1592–609.
Article PubMed PubMed Central Google Scholar
Barnett ML, Christakis NA, O’Malley AJ, Onnela J-P, Keating NL, Landon BE. Physician patient-sharing networks and the cost and intensity of care in US hospitals. Med Care. 2012a;50:152–60.
Article PubMed PubMed Central Google Scholar
Barnett ML, Keating NL, Christakis NA, O’Malley AJ, Landon BE. Reasons for referral among primary care and specialist physicians. J Gen Intern Med. 2012b;27:506–12.
Article PubMed Google Scholar
Berkman L, Glass T. Social integration, social methods, social support, and health. In: Social epidemiology. New York: Oxford University Press; 2000. p. 137–73.
Google Scholar
Boguñá M, Pastor-Satorras R, Díaz-Guilera A, Arenas A. Models of social networks based on social distance attachment. Phys Rev E. 2004;70:056122. https://doi.org/10.1103/PhysRevE.70.056122.
Article CAS Google Scholar
Bonacich P. Power and centrality: a family of measures. Am J Sociol. 1987;92:1170–82.
Article Google Scholar
Borgatti S, Everett M. Network analysis of 2-mode data. Soc Networks. 1997;19:243–69.
Article Google Scholar
Breiger R. The duality of persons and groups. Soc Forces. 1974;53:181–90.
Article Google Scholar
Cartwright D, Harrary F. A generalization of Heider’s theory. Psychol Rev. 1956;63:277–92.
Article PubMed CAS Google Scholar
Centola D. Failure in complex social networks. Math Sociol. 2009;33:64–8.
Article Google Scholar
Choi D, Wolfe P, Airoldi E. Stochastic blockmodels with growing number of classes. Arxiv preprint. 2010;arXiv:1011.4644.
Google Scholar
Christakis N, Fowler J. The spread of obesity in a large social network over 32 years. N Engl J Med. 2007;357:370–9.
Article PubMed CAS Google Scholar
Christakis NA, Fowler JH. Social contagion theory: examining dynamic social networks and human behavior. Stat Med. 2013;32:556–77.
Article PubMed Google Scholar
Coleman J, Katz E, Menzel H. The diffusion of innovations among physicians. Sociometry. 1957;20:253–70.
Article Google Scholar
Coleman J, Katz E, et al. Medical innovation: a diffusion study. Indianapolis: Bobbs-Merrill; 1966.
Google Scholar
Davidsen J, Ebel H, Bornholdt S. Emergence of a small world from local interactions: modeling acquaintance networks. Phys Rev Lett. 2002;88:128701. https://doi.org/10.1103/PhysRevLett.88.128701.
Article PubMed CAS Google Scholar
Dorogovtsev SN, Mendes JFF, Samukhin AN. Structure of growing networks with preferential linking. Phys Rev Lett. 2000;85:4633–6. https://doi.org/10.1103/PhysRevLett.85.4633.
Article PubMed CAS Google Scholar
Duijn MV, Snijders TAB, Zijlstra B. P2: a random effects model with covariates for directed graphs. Statistica Neerlandica. 2004;58:234–54.
Article Google Scholar
Erdős P, Rényi A. Random graphs. Publ Math. 1959;6:290–7.
Google Scholar
Faust K. Centrality in affliation networks. Soc Networks. 1997;19:157–91.
Article Google Scholar
Feller W. An introduction to probability theory and its applications, vol. 2. New York: Wiley; 1966.
Google Scholar
Festinger L. The analysis of sociograms using matrix algebra. Hum Relat. 1949;2:153–8.
Article Google Scholar
Fineberg S, Wasserman S. Categorical data analysis of single sociometric relations. In: Sociological methodology. New Jersey: Jossey-Bass; 1981. p. 156–92.
Google Scholar
Fletcher JM. Social interactions and smoking: evidence using multiple student cohorts, instrumental variables, and school fixed effects. Health Econ. 2008;19:466–84.
Article Google Scholar
Fletcher JM, Lehrer SF. The effect of adolescent health on educational outcomes: causal evidence using genetic lotteries between siblings. Canadian labor market and skills researcher network, working paper no. 32. 2009.
Google Scholar
Fortunato S. Community detection in graphs. Phys Reports. 2010;486:75–174.
Article Google Scholar
Frank O, Strauss D. Markov graphs. J Am Stat Assoc. 1986;81:832–42.
Article Google Scholar
Freeman L. Centrality in social networks, I. Conceptual clarification. Soc Networks. 1979;1:215–39.
Article Google Scholar
Freeman L. The development of social network analysis: a study in the sociology of science. Vancouver: Empirical Press; 2004.
Google Scholar
Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabasi A-L. The human disease network. Proc Natl Acad Sci. 2007;104:8685–90. http://www.pnas.org/content/104/21/8685.abstract
Article PubMed CAS PubMed Central Google Scholar
Goldenberg A, Zheng AX, Fineberg SE, Airoldi EM. A survey of statistical network models. Found Trends Mach Learn. 2009;2:129–233.
Article Google Scholar
Goodreau S. Advances in exponential random graph (p*) models applied to a large social network. Soc Networks. 2007;29:231–48.
Article PubMed PubMed Central Google Scholar
Granovetter MS. The strength of weak ties. Am J Sociol. 1973;78:1360–80.
Article Google Scholar
Guimera R, Nunes Amaral LA. Functional cartography of complex metabolic networks. Nature. 2005;433:895–900.
Article PubMed PubMed Central CAS Google Scholar
Haines V, Hurlbert J. Network range and health. J Health Soc Behav. 1992;33:254–66.
Article PubMed CAS Google Scholar
Handcock MS, Robins GL, Snijders TAB, Moody J, Besag J. Assessing degeneracy in statistical models of social networks. J Am Stat Assoc. 2003;76:33–50.
Google Scholar
Handcock M, Raftery A, Tantrum J. Model-based clustering for social networks. J Roy Stat Soc A. 2007;170:301–54.
Article Google Scholar
Handcock MS, Hunter DR, Butts CT, Goodreau SM, Krivitsky PN, Morris M. ergm: A package to fit, simulate and diagnose exponential-family models for networks, http://CRAN.R-project.org/package=ergm. Version 2.2-6. 2010. Project home page at http://statnetproject.org
Hanneke S, Fu W, Xing EP. Discrete temporal models of social networks. Electron J Stat. 2010;4:585–605.
Article Google Scholar
Harary F. On the notion of balance of a signed graph. Mich Math J. 1953;2:143–6.
Article Google Scholar
Harary F. The number of linear, directed rooted and connected graphs. Trans Am Math Soc. 1955;78:445–63.
Article Google Scholar
Heider F. Attitudes and cognitive orientation. J Psychol. 1946;21:107–12.
Article PubMed CAS Google Scholar
Hidalgo CA, Blumm N, Barabasi A-L, Christakis NA. A dynamic network approach for the study of human phenotypes. PLoS Comput Biol. 2009;5:e1000353. https://doi.org/10.1371/journal.pcbi.1000353.
Article PubMed PubMed Central CAS Google Scholar
Hoff PD. Bilinear mixed effects models for dyadic data. J Am Stat Assoc. 2005;100:286–95.
Article CAS Google Scholar
Hoff P. Modeling homophily and stochastic equivalence in symmetric relational data. In: Advances in neural information processing systems, vol. 20. Cambridge, MA: MIT Press; 2008. p. 657–64.
Google Scholar
Hoff PD, Raftery AE, Handcock MS. Latent space models for social networks analysis. J Am Stat Assoc. 2002;97:1090–8.
Article Google Scholar
Holland P, Leinhardt S. An exponential family of probability-distributions for directed-graph. J Am Stat Assoc. 1981;76:33–50.
Article Google Scholar
Holland P, Laskey K, Leinhardt S. Stochastic blockmodels: some first steps. Soc Networks. 1983;5:109–37.
Article Google Scholar
House J, Kahn R. Measures and concepts of social support. In: Social support and health. Orlando: Academic; 1985. p. 83–108.
Google Scholar
Huisman M, Van Duijn M. Software for statistical analysis of social networks. In: The Sixth International Conference on Logic and Methodology; Amsterdam: 2004.
Google Scholar
Huisman M, Van Duijn M. Software for social networks analysis. In: Models and methods in social network analysis. Cambridge: Cambridge University Press; 2005.
Google Scholar
Hunter D. Curved exponential family models for social networks. Soc Networks. 2007;29:216–30.
Article PubMed PubMed Central Google Scholar
Hunter DR, Handcock MS. Inference in curved exponential family models for networks. J Comput Graph Stat. 2006;15:565–83.
Article Google Scholar
Iwashyna TJ, Chang VW, Zhang JX, Christakis AN. Physician social networks and variation in prostate cancer treatment in three cities. Health Serv Res. 2002;37:1531–51.
Article PubMed PubMed Central Google Scholar
Karrer B, Newman MEJ. Stochastic blockmodels and community structure in networks. Phys Rev E. 2011;83:016107. https://doi.org/10.1103/PhysRevE.83.016107.
Article CAS Google Scholar
Katz L. On the matrix analysis of Sociometric data. Sociometry. 1947;10:233–41.
Article Google Scholar
Katz L. A new status index derived from sociometric analysis. Psychometrika. 1953;18:39–43.
Article Google Scholar
Katz L, Powell JH. Measurement of the tendency toward reciprocation of choice. Sociometry. 1955;18:659–65.
Article Google Scholar
Keating NL, Ayanian JZ, Cleary PD, et al. Factors affecting influential discussions among physicians: a social network analysis of a primary care practice. J Gen Intern Med. 2007;22:794–8.
Article PubMed PubMed Central Google Scholar
Klovdahl A. Social networks and the spread of infectious diseases. Soc Sci Med. 1985;21:1203–16.
Article PubMed CAS Google Scholar
Kossinets G, Watts DJ. Empirical analysis of an evolving social network. Science. 2006;311:88–90. http://www.sciencemag.org/content/311/5757/88.abstract
Article PubMed CAS Google Scholar
Krapivsky PL, Redner S, Leyvraz F. Connectivity of growing random networks. Phys Rev Lett. 2000;85:4629–32. https://doi.org/10.1103/PhysRevLett.85.4629.
Article PubMed CAS Google Scholar
Krivitsky PN. Exponential-family random graph models for valued networks. 2012. arXiv preprint, 1101.1359v2 [stat.ME] 19 Jan 2012.
Google Scholar
Krivitsky PN, Handcock MS. Fitting position latent cluster models for social networks with latentnet. J Stat Softw. 2008;24. http://statnetproject.org
Krivitsky PN, Handcock MS. A separable model for dynamic networks. 2010. arXiv preprint, 1011.1937v1[stat.ME].
Google Scholar
Kumpula JM, Onnela J-P, Saramäki J, Kaski K, Kertész J. Emergence of communities in weighted networks. Phys Rev Lett. 2007;99:228701. https://doi.org/10.1103/PhysRevLett.99.228701.
Article PubMed CAS Google Scholar
Landon BE, Keating NL, Barnett ML, Onnela JP, Paul S, OˆaMalley AJ, Keegan T, Christakis NA. Variation in patient-sharing networks of physicians across the United States. JAMA. 2012;308:265–73.
PubMed PubMed Central CAS Google Scholar
Laumann E, Marsden P, Prensky D. The boundary specification problem in network analysis. In: Burt R, Minor M, editors. Applied network analysis: a methodological introduction. Beverly Hills: Sage; 1983. p. 18–34.
Google Scholar
Lorrain F, White H. Structural equivalence of individuals in social networks. J Math Sociol. 1971;1:49–80.
Article Google Scholar
Lyons R. The spread of evidence-poor medicine via flawed social-network analyses. Stat Polit Policy. 2011;2:1–26.
Google Scholar
Manski CA. Identification of endogenous social effects: the reflection problem. Rev Econ Stud. 1993;60:531–42.
Article Google Scholar
Marsden P. Network methods in social epidemiology. In: Methods in social epidemiology. New York: Jossey-Bass; 2006. p. 267–86.
Google Scholar
Marsden PV, Friedkin NE. Network studies of social influence. Sociol Methods Res. 1993;22:127–51.
Article Google Scholar
Marsili M, Vega-Redondo F, Slanina F. The rise and fall of a networked society: a formal model. Proc Natl Acad Sci USA. 2004;101:1439–42.
Article PubMed CAS PubMed Central Google Scholar
McPherson ML, Smith-Lovin C, et al. Birds of a feather: homophily in social networks. Annu Rev Sociol. 2001;27:415–44.
Article Google Scholar
Moreno JL. Who shall survive? Nervous and mental disease processing. The University of Michigan, Ann Arbor; 1934.
Google Scholar
Mucha PJ, Richardson T, Macon K, Porter MA, Onnela J-P. Community structure in time-dependent, multiscale, and multiplex networks. Science. 2010;328:876–8. http://www.sciencemag.org/content/328/5980/876.abstract
Article PubMed CAS Google Scholar
Newcomb TM. An approach to the study of communicative acts. Psychol Rev. 1953;60:393–404.
Article PubMed CAS Google Scholar
Newman ME. Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Phys Rev. 2001;64:016132.
CAS Google Scholar
Newman MEJ. Modularity and community structure in networks. Proc Natl Acad Sci. 2006;103:8577–82.
Article PubMed CAS PubMed Central Google Scholar
Newman M. Networks: an introduction. New York: Oxford University Press; 2010.
Book Google Scholar
Newman MEJ. Communities, modules and large-scale structure in networks. Nat Phys. 2012;8:25–31.
Article CAS Google Scholar
Newman MEJ, Girvan M. Mixing patterns and community structure in networks. In: Pastor-Satorras R, Rubi J, Diaz-Guilera A, editors. Statistical mechanics of complex networks. Berlin: Springer; 2003.
Google Scholar
Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys Rev E. 2004;69:026113. https://doi.org/10.1103/PhysRevE.69.026113.
Article CAS Google Scholar
Nowicki K, Snijders TAB. Estimation and prediction for stochastic blockstructures. J Am Stat Assoc. 2001;96:1077–87.
Article Google Scholar
O’Malley AJ. The analysis of social network data: an exciting frontier for statisticians. Stat Med. 2013;32:539–55.
Article PubMed Google Scholar
O’Malley AJ, Christakis NA. Longitudinal analysis of large social networks: estimating the effect of health traits on changes in friendship ties. Stat Med. 2011;30:950–64.
Article PubMed PubMed Central Google Scholar
O’Malley AJ, Marsden PV. The analysis of social networks. Health Serv Outcome Res Methodol. 2008;8:222–69.
Article Google Scholar
O’Malley AJ, Arbesman S, Steiger DM, Fowler JH, Christakis NA. Egocentric social network structure, health, and pro-social behaviors in a National Panel Study of Americans. PLoS One. 2012;7:e36250. https://doi.org/10.1371/journal.pone.0036250.
Article PubMed PubMed Central CAS Google Scholar
Opsahl T. Triadic closure in two-mode networks: redefining the global and local clustering coefficients. Soc Networks. 2011; 34. https://doi.org/10.1016/j.socnet.2011.07.001.
Article Google Scholar
Opsahl T, Agneessens F, Skvoretz J. Node centrality in weighted networks: generalizing degree and shortest paths. Soc Networks. 2010;32:245–51.
Article Google Scholar
Palla G, Derenyi I, Farkas I, Vicsek T. Uncovering the overlapping community structure of complex networks in nature and society. Nature. 2005;435:814–8. https://doi.org/10.1038/nature03607.
Article PubMed CAS Google Scholar
Paul S, O’Malley AJ. Hierarchical longitudinal models of relationships in social networks. J R Stat Soc Ser C Appl Stat. 2013;62:705–22.
PubMed PubMed Central Google Scholar
Pham HH, O’Malley AS, Bach PB, Saiontz-Martinez C, Schrag D. Primary care physicians’ links to other physicians through Medicare patients: the scope of care coordination. Ann Intern Med. 2009;150:236–42.
Article PubMed PubMed Central Google Scholar
Piraveenan M, Prokopenko M, Zomaya AY. Assortative mixing in directed biological networks. IEEE Trans Comput Biol Bioinform. 2010;9:66–78. To appear.
Article PubMed Google Scholar
Pollack CE, Weissman G, Bekelman J, Liao K, Armstrong K. Physician social networks and variation in prostate cancer treatment in three cities. Health Serv Res. 2012;47:380–403.
Article PubMed PubMed Central Google Scholar
Porter MA, Onnela J-P, Mucha PJ. Communities in networks. Not Am Math Soc. 2009;56(1082–1097):1164–6.
Google Scholar
Price DDS. A general theory of bibliometric and other cumulative advantage processes. J Am Soc Inf Sci. 1976;27:292–306. https://doi.org/10.1002/asi.4630270505.
Article Google Scholar
Robins G, Pattison P, Woolcock J. Small and other worlds: global network structures from local processes. Am J Sociol. 2005;110:894–936.
Article Google Scholar
Robins GL, Snijders TAB, Wang P, Handcock MS, Pattison PE. Recent developments in exponential random graph (p^∗) models for social networks. Soc Networks. 2007;29:192–215.
Article Google Scholar
Robins GL, Pattison PE, Wang P. Closure, connectivity and degree distributions: exponential random graph (p*) models for directed social networks. Soc Networks. 2009;31:105–7.
Article Google Scholar
Rubin D. Bayesian inference for causal effects: the role of randomization. Ann Stat. 1978;6:34–58.
Article Google Scholar
Seidman SB. Network structure and minimum degree. Soc Networks. 1983;5:269–87.
Article Google Scholar
Shalizi RR, Rinaldo A. Consistency under sampling of exponential random graph models. 2012. arXiv preprint. arXiv:1111.3054v3
Google Scholar
Shalizi CR, Thomas AC. Homophily and contagion are generically confounded in observational social network studies. Sociol Methods Res. 2011;40:211–39.
Article PubMed PubMed Central Google Scholar
Simmel G. The sociology of Georg Simmel. New York: The Free Press; 1908.
Google Scholar
Snijders T. The degree variance: an index of graph heterogeneity. Soc Networks. 1981;3:163–74.
Article Google Scholar
Snijders T. Stochastic actor-oriented models for network change. J Math Sociol. 1996;21:149–72.
Article Google Scholar
Snijders TAB. The statistical evaluation of social network dynamics. In: Sociological methodology. Oxford, UK: Basil Blackwell; 2001. p. 361–95.
Google Scholar
Snijders TAB. Models for longitudinal social network data. In: Models and methods in social network analysis. Cambridge: Cambridge University Press; 2005. p. 215–47.
Chapter Google Scholar
Snijders TAB. Statistical methods for network dynamics. In: Luchini SR et al., editors. Proceedings of the XLIII Scientific Meeting, Italian Statistical Society, Basil Blackwell, Ltd; 2006. p. 281–96
Google Scholar
de Solla Price DJ. Networks of scientific papers. Science. 1965;149:510–5. http://www.sciencemag.org/content/149/3683/510.short.
Article Google Scholar
Steglich C, Snijders TAB, Pearson M. Dynamic networks and behavior: separating selection from influence. Sociol Methodol. 2010;40:329–93.
Article Google Scholar
Szabo G, Barabasi AL. Network effects in service usage. 2007. Arxiv preprint. http://lanl.arxiv.org/abs/physics/0611177
Thompson S. Adaptive web sampling. Biometrics. 2006;62:1224–34.
Article PubMed Google Scholar
Thompson S, Frank O. Mode-based estimation with link-tracing sampling designs. Survey Methodol. 2000;26:87–98.
Google Scholar
Thompson S, Seber GAF. Adaptive sampling. New York: Wiley; 1996.
Google Scholar
Toivonen R, Onnela J-P, Saramäki J, Hyvönen J, Kaski K. A model for social networks. Phys A Stat Mech Appl. 2006;371:851–60. http://www.sciencedirect.com/science/article/pii/S0378437106003931
Article Google Scholar
Traud AL, Mucha PJ, Porter MA. Social structure of Facebook networks. Phys A Stat Mech Appl. 2012;391:4165–80. http://www.sciencedirect.com/science/article/pii/S0378437111009186
Article Google Scholar
VanderWeele TJ. Sensitivity analysis for contagion effects in social networks. Sociol Methods Res. 2011;40:240–55.
Article PubMed PubMed Central Google Scholar
VanderWeele TJ, Ogburn EL, Tchetgen Tchetgen EJ. Why and when “Flawed” social network analyses still yield valid tests of no contagion. Stat Polit Policy. 2012;3:1050. https://doi.org/10.1515/2151-7509.1050.
Vázquez A. Growing network with local rules: preferential attachment, clustering hierarchy, and degree correlations. Phys Rev E. 2003;67:056104. https://doi.org/10.1103/PhysRevE.67.056104.
Article CAS Google Scholar
Wang W, Wong G. Stochastic Blockmodels for directed graphs. J Am Stat Assoc. 1987;82:8–19.
Article Google Scholar
Wang P, Sharpe K, Robins GL, Pattison PE. Exponential random graph (p*) models for affiliation networks. Soc Networks. 2009;31:12–25.
Article Google Scholar
Wasserman SS, Faust K. Social network analysis: methods and applications. Cambridge: Cambridge University Press; 1994.
Book Google Scholar
Wasserman S, Pattison P. Logit models and logistic regressions for social networks: I. An introduction to Markov graphs and p^∗. Psychometrika. 1996;61:401–25.
Article Google Scholar
Westveld AH, Hoff PD. A mixed effect model for longitudinal relational and network data, with applications to international trade and conflict. Ann Appl Stat. 2011;5:843–72.
Article Google Scholar
White D, Harary F. The cohesiveness of blocks in social networks: node connectivity and conditional density. Sociol Methodol. 2001;31:305–59.
Article Google Scholar
Wong LH, Pattison P, Robins G. A spatial model for social networks. Phys A Stat Mech Appl. 2006;360:99–120. http://www.sciencedirect.com/science/article/pii/S0378437105004334
Article Google Scholar
Zijlstra BJH, Duijn MV, Snijders TAB. The multilevel P2 model: a random effects model for the analysis of multiple social networks. Methodology. 2006;2:42–7.
Article Google Scholar

Download references

Acknowledgments

The time and effort of Dr. O’Malley and Dr. Onnela on researching and developing this chapter was supported by NIH/NIA grant P01 AG031093 and Robert Wood Johnson Award #58729. The authors thank Mischa Haider, Brian Neelon, and Bruce E Landon for reviewing an early draft of the manuscript and providing several useful comments and suggestions.

Author information

Authors and Affiliations

The Dartmouth Institute for Health Policy and Clinical Practice, Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
Alistair James O’Malley
Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
Jukka-Pekka Onnela
Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
Alistair James O’Malley

Authors

Alistair James O’Malley
View author publications
You can also search for this author in PubMed Google Scholar
Jukka-Pekka Onnela
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alistair James O’Malley .

Editor information

Editors and Affiliations

Community Health and Epidemiology, Dalhousie University, Halifax, NS, Canada
Adrian Levy
ICON plc, Vancouver, BC, Canada
Sarah Goring
Department of Biostatistics, Brown University, Providence, RI, USA
Constantine Gatsonis
University of British Columbia, Vancouver, BC, Canada
Boris Sobolev
European Observatory on Health Systems and Policies, Department of Health Care Management, Berlin University of Technology, Berlin, Germany
Ewout van Ginneken
Department Health Care Management Faculty of Economics and Management, Technische Universität Berlin, Berlin, Germany
Reinhard Busse

Glossary of Terms

To help readers familiar with social networks understand the network science component of the chapter and conversely for readers familiar with network science to understand the social network component, the following glossary contains a comprehensive list of terms and definitions.

Terms Used in Social Networks

1.
Social network: A collection of actors (referred to as actors) and the (social) relationships or ties linking them.
2.
Relationship, Tie: A link or connection between two actors.
3.
Dyad: A pair of actors in a network and the relationship(s) between them, two relationships per measure for a directed network, one relationship per measure for an undirected network.
4.
Triad: A triple of three actors in the network and the relationships between them.
5.
Scale or valued relationship: A nonbinary relationship between two actors (e.g., the level of a trait). We focused on binary relationships in the chapter.
6.
Directed network: A network in which the relationship from actor i to actor j need not be the same as that from actor j to actor i.
7.
Nondirected network: A network in which the state of the relationship from actor i to actor j equals the state of the relationship from actor j to actor i.
8.
Sociocentric network data: The complete set of observations on the n(n − 1) relationships in a directed network, or n(n − 1)/2 relationships in an undirected network, with n actors.
9.
Collaboration network: A network whose ties represent the actors’ joint involvement on a task (e.g., work on a paper) or a common experience (e.g., treating the same episode of health care for a patient).
10.
Bipartite: Relationships are only permitted between actors of two different types.
11.
Unipartite: Relationships are permitted between all types of actors.
12.
Social contagion, Social influence, Peer effects: Terms used to describe the phenomenon whereby an actor’s trait changes due to their relationship with other actors and the traits of those actors.
13.
Mutable trait: A characteristic of an actor than can change state.
14.
Social selection: The phenomena whereby the relationship status between two actors depends on their characteristics, as occurs with homophily and heterophily.
15.
Homophily: A preference for relationships with actors who have similiar characteristics. Popularly referred to as “birds of a feather flock together.”
16.
Heterophily: A preference for relationships with actors who have different characteristics. Popularly referred to as “opposites attracting.”
17.
In-degree, Popularity: The number of actors who initiated a tie with the given actor.
18.
Out-degree, Expansiveness, Activity: The number of ties the given actor initiates with other actors.
19.
k-star: A subnetwork in which the focal actor has ties to k other actors.
20.
k-cycle: A subnetwork in which each actor has degree 2 that can be arranged as a ring (i.e., a k-path through the actors returns to its origin without backtracking. For example, the ties A-B, B-C, and C-A form a three-cycle.
21.
k degrees of separation: Two individuals linked by a k-path (k − 1 intermediary actors) that are not connected by any path of length k − 1 or less.
22.
Density: The overall tendency of ties to form in the network. A descriptive measure is given by the number of ties in the network divided by the total number of possible ties.
23.
Reciprocity: The phenomena whereby an actor i is more likely to have a tie with actor j if actor j has a tie with actor i. Only defined for directed networks.
24.
Clustering: The tendency of ties to cluster and form densely connected regions of the network.
25.
Closure: The tendency for network configurations to be closed.
26.
Transitivity: The tendency for a tie from individual A to individual B to form if ties from individual A to individual C and from individual C to individual B exist. A form of triadic closure commonly stated as “a friend of a friend is a friend.” Reduces to general triadic closure in an undirected network.
27.
Centrality: A dimenionless measure of an actor’s position in the network. Higher values indicate more central positions. There are numerous measures of centrality. Four common ones are degree, closeness, betweeness, and eigenvalue centrality. Degree and eigenvalue centrality are extremes in that degree centrality is determined solely from an actor’s degree (it is internally focused) while eigenvalue centrality is based on the centrality of the actors connected to the focal actor (it is externally focused).
28.
Structural balance: A theory which suggests actors seek balance in their relationships; for example, if A likes B and B likes C then A will endeavor to like C as well to keep the system balanced. Thus, the existence of transitivity is implied by structural balance.
29.
Structural equivalence: The network configuration (arrangement of ties) around one actor is similar to that of another actor. Even though actors may not be connected, they can still be in structurally similar situations.
30.
Structural power: An actor in a dominant position in the network. Such an actor may be one in a strategic position, such as the only bridge between otherwise distinct components.
31.
Network component: A subset of actors having no ties external to themselves.
32.
Graph theory: The mathematical basis under which theoretical results for networks are derived and empirical computations are performed.
33.
Digraph: A graph in which edges can be bidirectional. Unlike social networks, digraphs can contain self-ties. Graphs lie in two-dimensional space.
34.
Hypergraph: A graph in dimension three or higher.
35.
Maximal subset: A set of actors for whom all ties are intact in a binary network (i.e., has density 1.0). If the set contains k actors, the maximal subset is referred to as a k-clique.
36.
Scalar, vector, matrix: Terms from linear and abstract algebra. A scalar is a 1 × 1 matrix, a vector is a k × 1 matrix, and a matrix is k × p, where k, p > 1.
37.
Adjacency matrix: A matrix whose off-diagonal elements contain the value of the relationship from one actor to another. For example, element ij contains the relationship from actor i to actor j. The diagonal elements are zero by definition.
38.
Matrix transpose: The operation whereby element ij is exchanged with element ji for all i, j.
39.
Row stochastic matrix: A matrix whose rows sum to 1 and contain nonnegative elements. Thus, each row represents a probability distribution of a discrete-valued random variable.
40.
Random variable: A variable whose value is not known with certainty. It can relate to an event or time period that is yet to occur, or it can be a quantity whose value is fixed (i.e., has occurred) but is unknown.
41.
Parametric: A term used in statistics to describe a model with a specific functional form (e.g., linear, quadratic, logarithmic, exponential) indexed by unknown parameters or an estimation procedure that relies on specification of the complete distribution of the data.
42.
Nonparametric: A model or estimation procedure that makes no assumption about the specific form of the relationship between key variables (e.g., whether the predictors have linear or additivie effects on the outcome) and does not rely upon complete specification of the distribution of the data for estimation.
43.
Outcome, Dependent variable: The variable considered causally dependent on other variables of interest. This will typically be a variable whose value is believed to be caused by other variables.
44.
Independent, Predictor, Explanatory variable, Covariate: A variable believed to be a cause of the outcome.
45.
Contextual variable: A variable evaluated on the neighbors of, or other members of a set containing, the focal actor. For example, the proportion of females in a neighboring county, the proportion of friends with college degrees.
46.
Interaction effect: The extent to which the effect of one variable on the outcome varies across the levels of another variable.
47.
Endogenous variable: A variable (or an effect) that is internal to a system.

Predictors in a regression model that are correlated with the unobserved error are endogeneous; they are determined by an internal as opposed to an external process. By definition outcome variables are endogenous.
48.
Exogenous variable: A variable (or an effect) that is external to the system in that its value is not determined by other variables in the system. Predictors that are independent of the error term in a regression model are exogeneous.
49.
Instrumental variable (IV): A variable with a non-null effect on the endogeneous predictor whose causal effect is of interest (the “treatment”) that has no effect on the outcome other than that through its effect on treatment. Often-used sufficient conditions for the latter are that the IV is (i) marginally independent of any unmeasured confounders and (ii) conditionally independent of the outcome given the treatment and any unmeasured confounders. In an IV analysis a set of observed predictors may be conditioned on as long as they are not effects of the treatment and the IV assumptions hold conditional on them. While subject to controversy, IV methods are one of the only methods of estimating the true (causal) effect of an endogeneous predictor on an outcome.
50.
Linear regression model: A model in which the expected value of the outcome (or dependent variable) conditional on one or more predictors (or explanatory variables) is a linear combination of the predictors (an additive sum of the predictors multiplied by their regression coefficients) and an unobserved random error.
51.
Longitudinal model: A model that describes variation in the outcome variable over time as a function of the predictors, which may include prior (i.e., lagged) values of the outcome. Observations are typically only available at specific, but not necessarily equally spaced, times. Longitudinal models make the direction of causality explicit. Therefore, they can distinguish between the association between the predictors and the outcome and the effect of a change in the predictor on the change in the outcome.
52.
Cross-sectional model: A model of the relationship between the values of the predictors and outcomes at a given time. Because one cannot discern the direction of causality, cross-sectional models are more difficult to defend as causal.
53.
Stochastic block model: A conditional dyadic independence model in which the density and reciprocity effects differ between blocks defined by attributes of the actors comprising the network. For example, blocks for gender accomodate different levels of connectedness and reciprocity for men and women.
54.
Logistic regression: A member of the exponential family of models that is specific to binary outcomes. It utilizes a link function that maps expected values of the outcome onto an unrestricted scale to ensure that all predictions from the model are well-defined.
55.
Multinomial distribution: A generalization of the binomial distribution to three or more categories. The sum of the probabilities of each category equals 1.
56.
Exponential random graph model: A model in which the state of the entire network is the dependent variable. Provides a flexible approach to accounting for various forms of dependence in the network. Not amenable to causal modeling.
57.
Degeneracy: An estimation problem encountered with exponential random graph models in which the fitted model might reproduce observed features of the network on average but each actor draw bears no resemblence to the observed network. Often degenerate draws are empty or complete graphs.
58.
Latent distance model: A model in which the status of dyads are independent conditional on the positions of the actors, and thus the distance between them, in a latent social space.
59.
Latent eigenmodel: A model in which the status of dyads are independent conditional on the product of the (weighted) latent positions of the actors in the dyad.
60.
Latent variable: An unobserved random variable. Random effects and pure error terms are latent variables.
61.
Latent class: An unobserved categorical random variable. Actors with the same value of the variable are considered to be in the same latent class.
62.
Factor analysis: A statistical technique used to decompose the correlation (or covariance) matrix of a set of random variables into groups of related items.
63.
Generalized estimating equation (GEE): A statistical method that corrects estimation errors for dependent observations without necessarily modeling the form of the dependence or specifying the full distribution of the data.
64.
Random effect: A parameter for the effect of a unit (or cluster) that is drawn from a specified probability distribution. Treating the unit effects as random draws from a common probability distribution allows information to be pooled across units for the estimation of each unit-specific parameter.
65.
Fixed effect: A parameter in a model that reflects the effect of an actor belonging to a given unit (or cluster). By virtue of modeling the unit effects as unrelated parameters, no information is shared between units and so estimates are based only on information within the unit.
66.
Ordinary least squares: A commonly used method for estimating the parameters of a regression model. The objective function is to minimize the squared distance of the fitted model to the observed values of the dependent variable.
67.
Maximum likelihood: A method of estimating the parameters of a statistical model that typically embodies parametric assumptions. The procedure is to seek the values of the parameters that maximize the likelihood function of the data.
68.
Likelihood function: An expression that quantifies the total information in the data as a function of model parameters.
69.
Markov chain Monte Carlo: A numerical procedure used to fit Bayesian statistical models.
70.
Steady state: The state-space distribution of a Markov chain describes the long-run proportion of time the random variable being modeled is in each state. Often Markov chains iterate through a transient phase in which the current state of the chain depends less and less on the initial state of the chain. The steady state phase occurs when successive samples have the same distribution (i.e., there is no dependence on the initial state).
71.
Colinearity: The correlation between two predictors after conditioning on the other observed predictors (if any). When predictors are colinear, distinguishing their effects is difficult, and the statistical properties of the estimated effects are more sensitive to the validity of the model.
72.
Normal distribution: Another name for the Gaussian distribution. Has a bell-shaped probability density function.
73.
Covariance matrix: A matrix in which the ijth element contains the covariance of items i and j.
74.
Absolute or Geodesic distance: The total distance along the edges of the network from one actor to another.
75.
Cartesian distance: The distance between two points on a two-dimension surface or grid. Adheres to Pythagorus Theorem.
76.
Count data: Observations made on a variable with the whole numbers (0, 1, 2, …) as its state space.
77.
Statistical inference: The process of establishing the level of certainty of knowledge about unknown parameters (or hypothesis) from data subject to random variation, such as when observations are measured imperfectly with no systematic bias or a sample from a population of interest is used to estimate population parameters.
78.
Null model: The model of a network statistic typically represents what would be expected if the feature of interest was nonexistent (effect equal to 0) or outside the range of interest.
79.
Permutation test: A statistical test of a null hypothesis against an alternative implemented by randomly reshuffling the labels (i.e., the subscripts) of the observations. The significance level of the test is evaluated by resampling the observed data 50–100 times and computing the proportion of times that the test is rejected.

Terms Used in Network Science

1.
Network science: The approach developed from 1995 onwards mostly within statistical physics and applied mathematics to study networked systems across many domains (e.g., physical, biological, social, etc). Usually focuses on very large systems; hence, theoretical results derived in the thermodynamic limit are good approximations to real-world systems.
2.
Thermodynamic limit: In statistical physics refers to the limit obtained for any quantity of interest as system size N tends to infinity. Many analytical results within network science are derived in this limit due to analytical tractability.
3.
Statistical physics: The branch of physics dealing with many body systems where the particles in the system obey a fix set of rules, such as Newtonian mechanics, quantum mechanics, or any other rule set. As the number of bodies (particles) in a system grows, it becomes increasingly difficult (and less informative) to write down the equations of motion, a set of differential equations that govern the motion of the particles over time, for the system. However, one can describe these systems probabilistically. The word “statistical” is somewhat misleading as there is no statistics in the sense of statistical inference involved; instead everything proceeds from a set of axioms, suggesting that “probabilistic” might be a better term. Statistical physics, also called statistical mechanics, gives a microscopic explanation to the phenomena that thermodynamics explains phenomenologically.
4.
Generative model: Most network models within network science belong to this category. Here one specifies the microscopic rules governing, for example, the attachment of new nodes to the existing network structure in models of network growth.
5.
Cumulative advantage: A stylized modeling mechanism introduced by Price in 1976 to capture phenomena where “success breeds success.” Price applied the model to study citation patterns where power-law or power-law-like distributions are observed for the distribution of the number of citations and successfully reproduced by the model.
6.
Polya urn model: A stylized sampling model in probability theory where the composition of the system, the contents of the urn, changes as a consequence of each draw from the urn.
7.
Power law: Refers to the specific functional form P (x) ∼ x^−α of the distribution of quantity x. Also called Pareto distribution. See scale-free network.
8.
Preferential attachment: A stylized modeling mechanism introduced by Barabasi and Albert in 1999 where the probability of a new node to attach itself to an existing node i of degree k_i is an increasing function of k_i; in the case of linear preferential attachment, this probability is directly proportional to k_i. In short, the higher the degree of a node, the higher the rate at which it acquires new connections (increases its degree).
9.
Weak ties hypothesis: A hypothesis developed by sociologist Mark Granovetter in his extremely influential 1973 paper “The strength of weak ties.” The hypothesis, in short, states the following: The stronger the tie connecting persons A and B, the higher the fraction of friends they have in common.
10.
Modularity: Modularity is a quality-function used in network community detection, where its value is maximized (in principle) over the set of all possible partitions of the network nodes into communities. Standard modularity reads as \( Q={(2m)}^{-1}{\sum}_{i,j}\left({A}_{ij}-\frac{k_i{k}_j}{2m}\right)\delta \left({c}_i,{c}_j\right) \) where c_i is the community assignment of node i and δ is Kronecker delta; other quantities as defined in the text.
11.
Rate equations: Rate equations, commonly used to model chemical reactions, are similar to master equations but instead of modeling the count of objects (e.g., number of nodes) in a collection of discrete states (e.g., the number of k-degree nodes N_k (t) for different values of k), they are used to model the evolution of continuous variables, such as average degree, over time.
12.
Master equations: Widely used in statistical physics, these differential equations model how the state of the system changes from one time point to the next. For example, if N_k (t) denotes the number of nodes of degree k, given the model, one can write down the equation for N_k (t + 1), i.e., the number of k-degree nodes at time t + 1.
13.
Fitness or affinity or attractiveness: A node attribute introduced to incorporate heterogeneity in the node population in a growing network model. For example, in a model based on preferential attachment, this could represent the inherent ability of a node to attract new edges, a mechanism that is superimposed on standard preferential attachment.
14.
Community: A group of nodes in a network that are, in some sense, densely connected to other nodes in the community but sparsely connected to nodes outside the community.
15.
Community detection: The set of methods and techniques developed fairly recently for finding communities in a given network (graph). The number of communities is usually not specified a priori but, instead, needs to be determined from data.
16.
Critical point: The value of a control parameter in a statistical mechanical system where the system exhibits critical behavior: previously localized phenomena now become correlated throughout the system which at this point behaves as one single entity.
17.
Phase diagram: A diagram displaying the phase (liquid, gas, etc.) of the system as one or more thermodynamic control parameters (temperature, pressure, etc.) are varied.
18.
Phase transition: Thermodynamic properties of a system are continuous functions of the thermodynamic parameters within a phase; phase transitions (e.g., liquid to gas) happen between phases where thermodynamic functions are discontinuous.
19.
Network diameter: The longest of the shortest pairwise paths in the network, computed for each dyad (node pair).
20.
Hysteresis: The behavior of a system depends not only on its current state but also on its previous state or states.
21.
Quality function: Typically a real-valued function with a high-dimensional domain that specifies the “goodness” of, say, a given network partitioning. For example, given the community assignments of N nodes, which can be seen as a point in an N-dimensional hypercube, the standard modularity quality function returns a number indicating how good the given partitioning is.
22.
Dynamic process: Any process that unfolds on a network over time according to a set of prespecified rules, such as epidemic processes, percolation, diffusion, synchronization, etc.
23.
Slice: In the context of multislice community detection, refers to one graph in a collection of many within the same system, where a slice can capture the structure of a network at a given time (time-dependent slice), at a particular resolution level (multiscale slice), or can encode the structure of a network for one tie type when many are present (multiplex slice).
24.
Scale-free network: Network with a power-law (Pareto) degree distribution.
25.
Erdős-Rényi model: Also known as Poisson random graph (after the fact that the degree distribution in the model follows a Poisson distribution), Bernoulli random graph (after the fact that each edge corresponds to an outcome of a Bernoulli process), or the random graph (as the progenitor of all random graphs). Starting with a fixed set of N nodes, one considers each node pair in turn independently of the other node pairs and connects the nodes with probability p. Erdős and Rényi first published the model in 1959, although Solomonoff and Rapoport published a similar model earlier in 1951.
26.
Watts-Strogatz model: A now canonical model by Watts and Strogatz that was introduced in 1998. Starting from a regular lattice structure characterized by high clustering and long paths, the model shows how randomly rewiring only a small fraction of edges (or, alternative, adding a small number of randomly placed edges) leads to a small-world characterized by high clustering and short paths. The model is conceptually appealing, and shows how to interpolate, using just one parameter, from a regular lattice structure in one extreme to an Erdős-Rényi graph in the other.
27.
Mean-field approximation: Sometimes called the zero-order approximation, this approximation replaces the value of a random variable by its average, thus ignoring any fluctuations (deviations) from the average that may actually occur. This approach is commonly used in statistical physics.
28.
Ensemble: A collection of objects, such as networks, that have been generated with the same set of rules, where each object in the ensemble has a certain probability associated with it. For example, one could consider the ensemble of networks that consists of six nodes and two edges, each begin equiprobable.

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

O’Malley, A.J., Onnela, JP. (2019). Introduction to Social Network Analysis. In: Levy, A., Goring, S., Gatsonis, C., Sobolev, B., van Ginneken, E., Busse, R. (eds) Health Services Evaluation. Health Services Research. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-8715-3_37

Download citation

DOI: https://doi.org/10.1007/978-1-4939-8715-3_37
Published: 12 February 2019
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-8714-6
Online ISBN: 978-1-4939-8715-3
eBook Packages: MedicineReference Module Medicine

Publish with us

Policies and ethics

Introduction to Social Network Analysis

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Glossary of Terms

Glossary of Terms

Terms Used in Social Networks

Terms Used in Network Science

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation