Skip to main content

Introduction to Social Network Analysis

  • Reference work entry
  • First Online:
Health Services Evaluation

Part of the book series: Health Services Research ((HEALTHSR))

Abstract

This chapter introduces statistical methods used in the analysis of social networks and in the rapidly evolving parallel-field of network science. Although several instances of social network analysis in health services research have appeared recently, the majority involve only the most basic methods and thus scratch the surface of what might be accomplished. Cutting-edge methods using relevant examples and illustrations in health services research are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 649.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 899.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Download references

Acknowledgments

The time and effort of Dr. O’Malley and Dr. Onnela on researching and developing this chapter was supported by NIH/NIA grant P01 AG031093 and Robert Wood Johnson Award #58729. The authors thank Mischa Haider, Brian Neelon, and Bruce E Landon for reviewing an early draft of the manuscript and providing several useful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alistair James O’Malley .

Editor information

Editors and Affiliations

Glossary of Terms

Glossary of Terms

To help readers familiar with social networks understand the network science component of the chapter and conversely for readers familiar with network science to understand the social network component, the following glossary contains a comprehensive list of terms and definitions.

Terms Used in Social Networks

  1. 1.

    Social network: A collection of actors (referred to as actors) and the (social) relationships or ties linking them.

  2. 2.

    Relationship, Tie: A link or connection between two actors.

  3. 3.

    Dyad: A pair of actors in a network and the relationship(s) between them, two relationships per measure for a directed network, one relationship per measure for an undirected network.

  4. 4.

    Triad: A triple of three actors in the network and the relationships between them.

  5. 5.

    Scale or valued relationship: A nonbinary relationship between two actors (e.g., the level of a trait). We focused on binary relationships in the chapter.

  6. 6.

    Directed network: A network in which the relationship from actor i to actor j need not be the same as that from actor j to actor i.

  7. 7.

    Nondirected network: A network in which the state of the relationship from actor i to actor j equals the state of the relationship from actor j to actor i.

  8. 8.

    Sociocentric network data: The complete set of observations on the n(n − 1) relationships in a directed network, or n(n − 1)/2 relationships in an undirected network, with n actors.

  9. 9.

    Collaboration network: A network whose ties represent the actors’ joint involvement on a task (e.g., work on a paper) or a common experience (e.g., treating the same episode of health care for a patient).

  10. 10.

    Bipartite: Relationships are only permitted between actors of two different types.

  11. 11.

    Unipartite: Relationships are permitted between all types of actors.

  12. 12.

    Social contagion, Social influence, Peer effects: Terms used to describe the phenomenon whereby an actor’s trait changes due to their relationship with other actors and the traits of those actors.

  13. 13.

    Mutable trait: A characteristic of an actor than can change state.

  14. 14.

    Social selection: The phenomena whereby the relationship status between two actors depends on their characteristics, as occurs with homophily and heterophily.

  15. 15.

    Homophily: A preference for relationships with actors who have similiar characteristics. Popularly referred to as “birds of a feather flock together.”

  16. 16.

    Heterophily: A preference for relationships with actors who have different characteristics. Popularly referred to as “opposites attracting.”

  17. 17.

    In-degree, Popularity: The number of actors who initiated a tie with the given actor.

  18. 18.

    Out-degree, Expansiveness, Activity: The number of ties the given actor initiates with other actors.

  19. 19.

    k-star: A subnetwork in which the focal actor has ties to k other actors.

  20. 20.

    k-cycle: A subnetwork in which each actor has degree 2 that can be arranged as a ring (i.e., a k-path through the actors returns to its origin without backtracking. For example, the ties A-B, B-C, and C-A form a three-cycle.

  21. 21.

    k degrees of separation: Two individuals linked by a k-path (k − 1 intermediary actors) that are not connected by any path of length k − 1 or less.

  22. 22.

    Density: The overall tendency of ties to form in the network. A descriptive measure is given by the number of ties in the network divided by the total number of possible ties.

  23. 23.

    Reciprocity: The phenomena whereby an actor i is more likely to have a tie with actor j if actor j has a tie with actor i. Only defined for directed networks.

  24. 24.

    Clustering: The tendency of ties to cluster and form densely connected regions of the network.

  25. 25.

    Closure: The tendency for network configurations to be closed.

  26. 26.

    Transitivity: The tendency for a tie from individual A to individual B to form if ties from individual A to individual C and from individual C to individual B exist. A form of triadic closure commonly stated as “a friend of a friend is a friend.” Reduces to general triadic closure in an undirected network.

  27. 27.

    Centrality: A dimenionless measure of an actor’s position in the network. Higher values indicate more central positions. There are numerous measures of centrality. Four common ones are degree, closeness, betweeness, and eigenvalue centrality. Degree and eigenvalue centrality are extremes in that degree centrality is determined solely from an actor’s degree (it is internally focused) while eigenvalue centrality is based on the centrality of the actors connected to the focal actor (it is externally focused).

  28. 28.

    Structural balance: A theory which suggests actors seek balance in their relationships; for example, if A likes B and B likes C then A will endeavor to like C as well to keep the system balanced. Thus, the existence of transitivity is implied by structural balance.

  29. 29.

    Structural equivalence: The network configuration (arrangement of ties) around one actor is similar to that of another actor. Even though actors may not be connected, they can still be in structurally similar situations.

  30. 30.

    Structural power: An actor in a dominant position in the network. Such an actor may be one in a strategic position, such as the only bridge between otherwise distinct components.

  31. 31.

    Network component: A subset of actors having no ties external to themselves.

  32. 32.

    Graph theory: The mathematical basis under which theoretical results for networks are derived and empirical computations are performed.

  33. 33.

    Digraph: A graph in which edges can be bidirectional. Unlike social networks, digraphs can contain self-ties. Graphs lie in two-dimensional space.

  34. 34.

    Hypergraph: A graph in dimension three or higher.

  35. 35.

    Maximal subset: A set of actors for whom all ties are intact in a binary network (i.e., has density 1.0). If the set contains k actors, the maximal subset is referred to as a k-clique.

  36. 36.

    Scalar, vector, matrix: Terms from linear and abstract algebra. A scalar is a 1 × 1 matrix, a vector is a k × 1 matrix, and a matrix is k × p, where k, p > 1.

  37. 37.

    Adjacency matrix: A matrix whose off-diagonal elements contain the value of the relationship from one actor to another. For example, element ij contains the relationship from actor i to actor j. The diagonal elements are zero by definition.

  38. 38.

    Matrix transpose: The operation whereby element ij is exchanged with element ji for all i, j.

  39. 39.

    Row stochastic matrix: A matrix whose rows sum to 1 and contain nonnegative elements. Thus, each row represents a probability distribution of a discrete-valued random variable.

  40. 40.

    Random variable: A variable whose value is not known with certainty. It can relate to an event or time period that is yet to occur, or it can be a quantity whose value is fixed (i.e., has occurred) but is unknown.

  41. 41.

    Parametric: A term used in statistics to describe a model with a specific functional form (e.g., linear, quadratic, logarithmic, exponential) indexed by unknown parameters or an estimation procedure that relies on specification of the complete distribution of the data.

  42. 42.

    Nonparametric: A model or estimation procedure that makes no assumption about the specific form of the relationship between key variables (e.g., whether the predictors have linear or additivie effects on the outcome) and does not rely upon complete specification of the distribution of the data for estimation.

  43. 43.

    Outcome, Dependent variable: The variable considered causally dependent on other variables of interest. This will typically be a variable whose value is believed to be caused by other variables.

  44. 44.

    Independent, Predictor, Explanatory variable, Covariate: A variable believed to be a cause of the outcome.

  45. 45.

    Contextual variable: A variable evaluated on the neighbors of, or other members of a set containing, the focal actor. For example, the proportion of females in a neighboring county, the proportion of friends with college degrees.

  46. 46.

    Interaction effect: The extent to which the effect of one variable on the outcome varies across the levels of another variable.

  47. 47.

    Endogenous variable: A variable (or an effect) that is internal to a system.

    Predictors in a regression model that are correlated with the unobserved error are endogeneous; they are determined by an internal as opposed to an external process. By definition outcome variables are endogenous.

  48. 48.

    Exogenous variable: A variable (or an effect) that is external to the system in that its value is not determined by other variables in the system. Predictors that are independent of the error term in a regression model are exogeneous.

  49. 49.

    Instrumental variable (IV): A variable with a non-null effect on the endogeneous predictor whose causal effect is of interest (the “treatment”) that has no effect on the outcome other than that through its effect on treatment. Often-used sufficient conditions for the latter are that the IV is (i) marginally independent of any unmeasured confounders and (ii) conditionally independent of the outcome given the treatment and any unmeasured confounders. In an IV analysis a set of observed predictors may be conditioned on as long as they are not effects of the treatment and the IV assumptions hold conditional on them. While subject to controversy, IV methods are one of the only methods of estimating the true (causal) effect of an endogeneous predictor on an outcome.

  50. 50.

    Linear regression model: A model in which the expected value of the outcome (or dependent variable) conditional on one or more predictors (or explanatory variables) is a linear combination of the predictors (an additive sum of the predictors multiplied by their regression coefficients) and an unobserved random error.

  51. 51.

    Longitudinal model: A model that describes variation in the outcome variable over time as a function of the predictors, which may include prior (i.e., lagged) values of the outcome. Observations are typically only available at specific, but not necessarily equally spaced, times. Longitudinal models make the direction of causality explicit. Therefore, they can distinguish between the association between the predictors and the outcome and the effect of a change in the predictor on the change in the outcome.

  52. 52.

    Cross-sectional model: A model of the relationship between the values of the predictors and outcomes at a given time. Because one cannot discern the direction of causality, cross-sectional models are more difficult to defend as causal.

  53. 53.

    Stochastic block model: A conditional dyadic independence model in which the density and reciprocity effects differ between blocks defined by attributes of the actors comprising the network. For example, blocks for gender accomodate different levels of connectedness and reciprocity for men and women.

  54. 54.

    Logistic regression: A member of the exponential family of models that is specific to binary outcomes. It utilizes a link function that maps expected values of the outcome onto an unrestricted scale to ensure that all predictions from the model are well-defined.

  55. 55.

    Multinomial distribution: A generalization of the binomial distribution to three or more categories. The sum of the probabilities of each category equals 1.

  56. 56.

    Exponential random graph model: A model in which the state of the entire network is the dependent variable. Provides a flexible approach to accounting for various forms of dependence in the network. Not amenable to causal modeling.

  57. 57.

    Degeneracy: An estimation problem encountered with exponential random graph models in which the fitted model might reproduce observed features of the network on average but each actor draw bears no resemblence to the observed network. Often degenerate draws are empty or complete graphs.

  58. 58.

    Latent distance model: A model in which the status of dyads are independent conditional on the positions of the actors, and thus the distance between them, in a latent social space.

  59. 59.

    Latent eigenmodel: A model in which the status of dyads are independent conditional on the product of the (weighted) latent positions of the actors in the dyad.

  60. 60.

    Latent variable: An unobserved random variable. Random effects and pure error terms are latent variables.

  61. 61.

    Latent class: An unobserved categorical random variable. Actors with the same value of the variable are considered to be in the same latent class.

  62. 62.

    Factor analysis: A statistical technique used to decompose the correlation (or covariance) matrix of a set of random variables into groups of related items.

  63. 63.

    Generalized estimating equation (GEE): A statistical method that corrects estimation errors for dependent observations without necessarily modeling the form of the dependence or specifying the full distribution of the data.

  64. 64.

    Random effect: A parameter for the effect of a unit (or cluster) that is drawn from a specified probability distribution. Treating the unit effects as random draws from a common probability distribution allows information to be pooled across units for the estimation of each unit-specific parameter.

  65. 65.

    Fixed effect: A parameter in a model that reflects the effect of an actor belonging to a given unit (or cluster). By virtue of modeling the unit effects as unrelated parameters, no information is shared between units and so estimates are based only on information within the unit.

  66. 66.

    Ordinary least squares: A commonly used method for estimating the parameters of a regression model. The objective function is to minimize the squared distance of the fitted model to the observed values of the dependent variable.

  67. 67.

    Maximum likelihood: A method of estimating the parameters of a statistical model that typically embodies parametric assumptions. The procedure is to seek the values of the parameters that maximize the likelihood function of the data.

  68. 68.

    Likelihood function: An expression that quantifies the total information in the data as a function of model parameters.

  69. 69.

    Markov chain Monte Carlo: A numerical procedure used to fit Bayesian statistical models.

  70. 70.

    Steady state: The state-space distribution of a Markov chain describes the long-run proportion of time the random variable being modeled is in each state. Often Markov chains iterate through a transient phase in which the current state of the chain depends less and less on the initial state of the chain. The steady state phase occurs when successive samples have the same distribution (i.e., there is no dependence on the initial state).

  71. 71.

    Colinearity: The correlation between two predictors after conditioning on the other observed predictors (if any). When predictors are colinear, distinguishing their effects is difficult, and the statistical properties of the estimated effects are more sensitive to the validity of the model.

  72. 72.

    Normal distribution: Another name for the Gaussian distribution. Has a bell-shaped probability density function.

  73. 73.

    Covariance matrix: A matrix in which the ijth element contains the covariance of items i and j.

  74. 74.

    Absolute or Geodesic distance: The total distance along the edges of the network from one actor to another.

  75. 75.

    Cartesian distance: The distance between two points on a two-dimension surface or grid. Adheres to Pythagorus Theorem.

  76. 76.

    Count data: Observations made on a variable with the whole numbers (0, 1, 2, ) as its state space.

  77. 77.

    Statistical inference: The process of establishing the level of certainty of knowledge about unknown parameters (or hypothesis) from data subject to random variation, such as when observations are measured imperfectly with no systematic bias or a sample from a population of interest is used to estimate population parameters.

  78. 78.

    Null model: The model of a network statistic typically represents what would be expected if the feature of interest was nonexistent (effect equal to 0) or outside the range of interest.

  79. 79.

    Permutation test: A statistical test of a null hypothesis against an alternative implemented by randomly reshuffling the labels (i.e., the subscripts) of the observations. The significance level of the test is evaluated by resampling the observed data 50–100 times and computing the proportion of times that the test is rejected.

Terms Used in Network Science

  1. 1.

    Network science: The approach developed from 1995 onwards mostly within statistical physics and applied mathematics to study networked systems across many domains (e.g., physical, biological, social, etc). Usually focuses on very large systems; hence, theoretical results derived in the thermodynamic limit are good approximations to real-world systems.

  2. 2.

    Thermodynamic limit: In statistical physics refers to the limit obtained for any quantity of interest as system size N tends to infinity. Many analytical results within network science are derived in this limit due to analytical tractability.

  3. 3.

    Statistical physics: The branch of physics dealing with many body systems where the particles in the system obey a fix set of rules, such as Newtonian mechanics, quantum mechanics, or any other rule set. As the number of bodies (particles) in a system grows, it becomes increasingly difficult (and less informative) to write down the equations of motion, a set of differential equations that govern the motion of the particles over time, for the system. However, one can describe these systems probabilistically. The word “statistical” is somewhat misleading as there is no statistics in the sense of statistical inference involved; instead everything proceeds from a set of axioms, suggesting that “probabilistic” might be a better term. Statistical physics, also called statistical mechanics, gives a microscopic explanation to the phenomena that thermodynamics explains phenomenologically.

  4. 4.

    Generative model: Most network models within network science belong to this category. Here one specifies the microscopic rules governing, for example, the attachment of new nodes to the existing network structure in models of network growth.

  5. 5.

    Cumulative advantage: A stylized modeling mechanism introduced by Price in 1976 to capture phenomena where “success breeds success.” Price applied the model to study citation patterns where power-law or power-law-like distributions are observed for the distribution of the number of citations and successfully reproduced by the model.

  6. 6.

    Polya urn model: A stylized sampling model in probability theory where the composition of the system, the contents of the urn, changes as a consequence of each draw from the urn.

  7. 7.

    Power law: Refers to the specific functional form P (x∼ x−α of the distribution of quantity x. Also called Pareto distribution. See scale-free network.

  8. 8.

    Preferential attachment: A stylized modeling mechanism introduced by Barabasi and Albert in 1999 where the probability of a new node to attach itself to an existing node i of degree ki is an increasing function of ki; in the case of linear preferential attachment, this probability is directly proportional to ki. In short, the higher the degree of a node, the higher the rate at which it acquires new connections (increases its degree).

  9. 9.

    Weak ties hypothesis: A hypothesis developed by sociologist Mark Granovetter in his extremely influential 1973 paper “The strength of weak ties.” The hypothesis, in short, states the following: The stronger the tie connecting persons A and B, the higher the fraction of friends they have in common.

  10. 10.

    Modularity: Modularity is a quality-function used in network community detection, where its value is maximized (in principle) over the set of all possible partitions of the network nodes into communities. Standard modularity reads as \( Q={(2m)}^{-1}{\sum}_{i,j}\left({A}_{ij}-\frac{k_i{k}_j}{2m}\right)\delta \left({c}_i,{c}_j\right) \) where ci is the community assignment of node i and δ is Kronecker delta; other quantities as defined in the text.

  11. 11.

    Rate equations: Rate equations, commonly used to model chemical reactions, are similar to master equations but instead of modeling the count of objects (e.g., number of nodes) in a collection of discrete states (e.g., the number of k-degree nodes Nk (t) for different values of k), they are used to model the evolution of continuous variables, such as average degree, over time.

  12. 12.

    Master equations: Widely used in statistical physics, these differential equations model how the state of the system changes from one time point to the next. For example, if Nk (t) denotes the number of nodes of degree k, given the model, one can write down the equation for Nk (t + 1), i.e., the number of k-degree nodes at time t + 1.

  13. 13.

    Fitness or affinity or attractiveness: A node attribute introduced to incorporate heterogeneity in the node population in a growing network model. For example, in a model based on preferential attachment, this could represent the inherent ability of a node to attract new edges, a mechanism that is superimposed on standard preferential attachment.

  14. 14.

    Community: A group of nodes in a network that are, in some sense, densely connected to other nodes in the community but sparsely connected to nodes outside the community.

  15. 15.

    Community detection: The set of methods and techniques developed fairly recently for finding communities in a given network (graph). The number of communities is usually not specified a priori but, instead, needs to be determined from data.

  16. 16.

    Critical point: The value of a control parameter in a statistical mechanical system where the system exhibits critical behavior: previously localized phenomena now become correlated throughout the system which at this point behaves as one single entity.

  17. 17.

    Phase diagram: A diagram displaying the phase (liquid, gas, etc.) of the system as one or more thermodynamic control parameters (temperature, pressure, etc.) are varied.

  18. 18.

    Phase transition: Thermodynamic properties of a system are continuous functions of the thermodynamic parameters within a phase; phase transitions (e.g., liquid to gas) happen between phases where thermodynamic functions are discontinuous.

  19. 19.

    Network diameter: The longest of the shortest pairwise paths in the network, computed for each dyad (node pair).

  20. 20.

    Hysteresis: The behavior of a system depends not only on its current state but also on its previous state or states.

  21. 21.

    Quality function: Typically a real-valued function with a high-dimensional domain that specifies the “goodness” of, say, a given network partitioning. For example, given the community assignments of N nodes, which can be seen as a point in an N-dimensional hypercube, the standard modularity quality function returns a number indicating how good the given partitioning is.

  22. 22.

    Dynamic process: Any process that unfolds on a network over time according to a set of prespecified rules, such as epidemic processes, percolation, diffusion, synchronization, etc.

  23. 23.

    Slice: In the context of multislice community detection, refers to one graph in a collection of many within the same system, where a slice can capture the structure of a network at a given time (time-dependent slice), at a particular resolution level (multiscale slice), or can encode the structure of a network for one tie type when many are present (multiplex slice).

  24. 24.

    Scale-free network: Network with a power-law (Pareto) degree distribution.

  25. 25.

    Erdős-Rényi model: Also known as Poisson random graph (after the fact that the degree distribution in the model follows a Poisson distribution), Bernoulli random graph (after the fact that each edge corresponds to an outcome of a Bernoulli process), or the random graph (as the progenitor of all random graphs). Starting with a fixed set of N nodes, one considers each node pair in turn independently of the other node pairs and connects the nodes with probability p. Erdős and Rényi first published the model in 1959, although Solomonoff and Rapoport published a similar model earlier in 1951.

  26. 26.

    Watts-Strogatz model: A now canonical model by Watts and Strogatz that was introduced in 1998. Starting from a regular lattice structure characterized by high clustering and long paths, the model shows how randomly rewiring only a small fraction of edges (or, alternative, adding a small number of randomly placed edges) leads to a small-world characterized by high clustering and short paths. The model is conceptually appealing, and shows how to interpolate, using just one parameter, from a regular lattice structure in one extreme to an Erdős-Rényi graph in the other.

  27. 27.

    Mean-field approximation: Sometimes called the zero-order approximation, this approximation replaces the value of a random variable by its average, thus ignoring any fluctuations (deviations) from the average that may actually occur. This approach is commonly used in statistical physics.

  28. 28.

    Ensemble: A collection of objects, such as networks, that have been generated with the same set of rules, where each object in the ensemble has a certain probability associated with it. For example, one could consider the ensemble of networks that consists of six nodes and two edges, each begin equiprobable.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

O’Malley, A.J., Onnela, JP. (2019). Introduction to Social Network Analysis. In: Levy, A., Goring, S., Gatsonis, C., Sobolev, B., van Ginneken, E., Busse, R. (eds) Health Services Evaluation. Health Services Research. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-8715-3_37

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8715-3_37

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4939-8714-6

  • Online ISBN: 978-1-4939-8715-3

  • eBook Packages: MedicineReference Module Medicine

Publish with us

Policies and ethics