Introduction

Social complexity is a much used concept in behavioural ecology (Kappeler 2019, Topical collection on Social complexity). However, definitions vary widely and, often, are not operationalized. Measures of social complexity have been sought and used for a variety of reasons, perhaps most notably to test the social intelligence hypothesis for the evolution of cognition (Kwak et al. 2018; Kappeler 2019, Topical collection on Social complexity) and the social complexity hypothesis for the evolution of communication (Freeberg et al. 2012).

In studies of non-human societies, the term social complexity has primarily been used in two broad ways. First, social complexity is used to describe the number of different types (roles) of individuals that make up a social group (e.g., Blumenstein and Armitage 1998; Groenewoud et al. 2016). Second, social complexity is used to describe the complexity of social relationships among individuals within a social group or population (e.g., Fischer et al. 2017). Recent work has highlighted the importance of considering these two aspects of social complexity separately. These two types of complexity appear to evolve under different patterns of local relatedness (Lukas and Clutton-Brock 2018). In social mammals, complex social relationships are associated with groups that have low relatedness, while members of groups composed of close relatives are more likely to show a diversity of roles (Lukas and Clutton-Brock 2018). While both aspects of social complexity have important implications, it is the measurement of the complexity of social relationships that we attempt to address here.

To have utility, measures of social complexity should be comparable across populations within species, as well as across species, perhaps within some higher taxon. This is challenging. Populations are typically of different sizes, demographics and may use space and interact socially in different ways. Furthermore, they are studied with different protocols and with differing intensities. Ideally, we seek a measure that is as follows: (a) unaffected by network size, so the social complexity calculated from a full social network is similar to that calculated from any substantial random portion of it; (b) little influenced by the addition of distantly connected individuals into the study network; (c) not biased high (suggesting false complexity) by sampling issues; and (d) not biased low (obscuring complexity) by low-intensity sampling. Measures of social complexity can potentially be multidimensional, with different dimensions capturing elements of the concept (e.g., Whitehead 2008; Fischer et al. 2017).

There have been two general perspectives to measuring social complexity using network data. The top-down approach looks at complexity as a network property, using measures such as size, diameter, modularity, dimensional coupling, disparity and computational complexity (Butts 2001; Whitehead 2008). These measures tend to be affected by network delineation, thus causing problems with issues (a) and (b) outlined previously. Indeed, these problems are common to many attempts to develop measures to compare the structure of social networks (Faust 2006).

An alternative, bottom-up, perspective, is to consider social complexity from the perspective of the members of a social network. Hinde (1976) defined social structure as the “nature, quality, and patterning of relationships”. Then, social complexity can be thought of as the complexity of dyadic relationships. If we operationalize relationships using “relationship measures”, such as interaction rates and association indices (Whitehead 2008), these can be used to estimate social complexity. Bergman and Beehner (2015) suggest a simple definition of social complexity as “the number of differentiated relationships that individuals have”. A good example of this relationship-based approach to social complexity, which builds on Bergman and Beehner’s (2015) ideas, is Fischer et al.’s (2017) method. Using detailed observations of affiliative and agonistic interactions, each dyadic relationship is quantified, and, then, these are clustered into one of four relationship classes. Social complexity is quantified using the diversity of relationships experienced by an individual, and individual-level complexities are aggregated into measures of group complexity. While Fischer et al.’s (2017) method is an appealing and rich approach, it depends on the availability of detailed data on direct social interactions (e.g., grooming and aggression), which are often difficult to observe in studies of the social structure of wild animals.

Many studies of social structure employ association indices, estimates of the proportion of time that a dyad is associated (Cairns and Schwager 1987). These association indices are used to infer the structure of social relationships within the population. Association indices (the “simple ratio index”, the “half-weight index”, etc.) are typically calculated as ratios: the number of times that the dyad was observed associating divided by the number of times that they could have been observed associating—a binomial process. Using this attribute of association indices, we introduce a method, which in some respects, parallels that of Fischer et al. (2017), for deriving a measure of social complexity, which we call association complexity, from association indices. We use binomial mixture models on association data to model the distribution of relationships within a population (see Fig. 1). The mixture models represent the associations as belonging to several classes, each with a mean strength of association and rate of occurrence within the population (McNicholas 2016). The mixture modelling finds how many classes are best supported by the data and, then, estimates these parameters. These are then input to a Shannon index of entropy (Shannon and Weaver 1949) to give a measure of diversity among the associations experienced by individuals, which we use to measure complexity.

Fig. 1
figure 1

Illustration of our dyadic concept of association complexity, illustrated for societies of low (a), medium (b) and high (c) complexities. Social networks (left) contain different numbers of relationship types (represented by edge colors), each with a unique distribution of true association indices (centre). We measure complexity as the uncertainty that an association is of a particular relationship type, visualised here as the sum of association indices of each type (right). A more even distribution of sums across more classes of association leads to greater uncertainty, resulting in higher values of S

Here, we first explain the method and, then, test it against simulated data. We explore the effects of sampling rate as well as within-class variability on our estimates of association complexity. Finally, we illustrate the process with real data and discuss potential extensions.

Methods

Binomial mixture models

We assume that each dyad, ij, has a real association index, Rij, that is the actual proportion of time that they are in association and that each Rij belongs to one of K relationship classes, though which class is unknown. So, for instance, there might be some tight “bonded” relationships with Rij = μ1 = 0.75, some pairs of “friends” with Rij = μ2 = 0.20 and some “casual acquaintances” with Rij = μ3 = 0.03.

Then, if the relationship between individual i and individual j is of class k (ij) (the classes, the ks, are labelled 1, 2, 3, …, K; each class with a real association index μk) and there are dij observation occasions, the number of observed associations, xij, is binomially distributed with sample size dij and probability μk(ij). Thus:

$$ {x}_{ij}\sim \mathrm{binomial}\ \left({d}_{ij},{\mu}_{k(ij)}\right) $$
(1)

We do not know K, the number of classes of relationship, the means for each class, {μk}, or the proportion of relationships in each class, {αk} [Σαk = 1]. However, mixture models allow us to estimate these parameters. Mixture models assume that an observed distribution is a mixture of several unknown distributions and estimate the nature and importance of these different components (McNicholas 2016). In our case, we are trying to dissect a distribution of relationship measures into its components, with each of the components representing a different class of relationship. The parameters [{μk}, {αk}] of the binomial mixture model are estimated using maximum likelihood via an expectation-maximization (EM) algorithm (see the Supplementary material for algorithm details). The number of classes, K, is estimated by fitting a set of candidate models with different values of K and choosing the best one based on criteria, such as the Bayesian Information Criteria (BIC), Akaike Information Criterion (AIC), or the Integrated Completed Likelihood (ICL) (McNicholas 2016). We calculate ICL as BIC + 2E, where E is the entropy of the classification matrix. Thus, ICL penalizes models in which the relationship class of dyads is uncertain.

Quantifying complexity

The mixture models suggest that relationships of class k occur with frequency αk and these dyads associate at a rate of μk (the strength of the association index). Thus, the frequency of associations in the population between two individuals with relationship class k is:

$$ {q}_k={\mu}_k.{\upalpha}_k/\sum {\mu}_k.{\upalpha}_k $$
(2)

Then, the diversity in association can be expressed by Shannon and Weaver’s (1949) entropy index:

$$ S=-\sum {q}_k.\mathrm{In}\left({q}_k\right) $$
(3)

And, this is our proposed measure of association complexity.

This measure has the desirable quality that, in general, social structures with more relationship classes will have a higher value of S. In addition, this measure also quantifies differences in the diversity of associations between social structures with the same number of relationship classes. A society will have higher complexity when the frequency with which classes occur decreases as the strength of association increases. Maximal complexity for a given number of classes is achieved when

$$ {\alpha}_k=\raisebox{1ex}{${\mu}_k^{-1}$}\!\left/ \!\raisebox{-1ex}{$\sum {\mu}_k^{-1}$}\right. $$
(4)

As under these conditions, associations of all classes are equally frequent. Deviations from Eq. (4) lead to differences in the frequency of associations of each class, which results in less diversity in association types. Societies with the same value of K can have very different values of S, and difference in values of K will not always reflect differences in S. Stated another way, S indicates the degree of uncertainty in the relationship class of a given association. As an example, consider three hypothetical societies, one with K = 5 and q = {0.2, 0.2, 0.2, 0.2, 0.2}, another with K = 5 and q = {0.9, 0.025, 0.025, 0.025, 0.025}, and a third with K = 2 and q = {0.5, 0.5}. The first two societies have the same number of relationship classes, but in the first, the frequency of associations of each class is the same, and thus, the diversity of associations is extremely high (S = 1.61), while in the second, one class dominates, reducing the association complexity (S = 0.47). Furthermore, while the third society has only two relationship classes, associations of both class are equally likely, leading to an estimate of complexity higher than the second society (S = 0.69). We illustrate the variation in S within and between values of K in our simulations (see subsequent texts).

Testing the method

We used simulated data to test our proposed method. We were particularly interested in which criterion to use for selecting the number of components (AIC, BIC, ICL), as well as how the sampling effort, indicated by the denominator of the association index (dij) might affect estimates of the number of classes of social relationship (K) and association complexity (S). In addition, we sought to more closely simulate real world data by including overdispersion within relationship classes. Overdispersion represents how much more variable observations are than a particular model assumes. In practice, overdispersion from a theoretical distribution could be caused by a variety of behavioural, psychological, environmental or measurement issues. Overdispersion in binomial data is often modelled via beta-binomial distributions. The beta-binomial distribution results from binomial trials in which the probability of success is not constant but follows a beta distribution with shape parameters β1 and β2. In this context, we have found it more useful to consider an alternate parameterization based on the mean, μ = β1/(β1 + β2), and the overdispersion parameter ρ = 1/(β1 + β2 + 1).

The simulations used Poisson and beta-binomial distributions to produce sets of dij and xij, respectively. These simulations were parameterized to reflect the characteristics of real world datasets. We examined six real association datasets (two of which are used as examples, in the subsequent texts) from individually identified wild cetaceans, calculating mean(dij) and estimating overdispersion, ρ, for each. Overdispersion, ρ, was estimated using maximum likelihood assuming the number of components (K), as well as values of {μk} and {αk} are as estimated by the binomial mixture models (using ICL; see subsequent texts). These suggested reasonable ranges of mean(dij) from 15 to 100 and ρ from 0 to 0.01.

We simulated a population of N associating individuals (Ndyad = (N(N − 1)) / 2). We simulated social structure by setting the number of relationship classes, choosing frequencies and distributions of association probabilities for each type, assigning dyads to types and then generating true dyadic association probabilities. We then simulated observational sampling of associations from this social structure. More specifically, in a given simulation run with K relationship classes, we

  1. 1.

    Drew relative αk from a uniform distribution on [0, 1], with the constraint that min (αk) > 0.1/K

  2. 2.

    Drew μk from a uniform distribution on [0, 1], with the constraint that they were at least 0.1 apart

  3. 3.

    Drew ρk from a uniform distribution on [0, 0.015]

  4. 4.

    Assigned k (ij) to dyads with probability αk

  5. 5.

    Generated Rij for each dyad from a beta distribution with mean μk(ij) and overdispersion parameter ρk(ij)

  6. 6.

    Generated dij from a Poisson distribution with mean D

  7. 7.

    Generated xij from a binomial distribution with probability Rij and dij trials

From these simulated social structures, we measured realized association complexity from the k (ij) and Rij and then fit a series of binomial mixture models with K = 1, 2, 3, 4, 5, 6, 7, 8, and 9 to the xij and dij. We chose a best value of K based on BIC, ICL, and AIC and recorded estimates of S based on the models chosen by each of these criteria.

We systematically varied the values of N, K, and D across simulations to test the method under different population sizes, social structures, and sampling effort. We ran 20 simulation runs for every combination of the following parameters: N = 20, 50; K = 1, 2, 3, 4, 5; D = 20, 40, 60, 80, 100.

To examine model performance at estimating S and K, we analysed the mean error in model estimates under different conditions. This gave us a measure of the degree to which our model accurately reflects actual complexity under different conditions, as well as allowing us to examine the model output for bias. We also estimated the correlation between true and estimated values of S for each criterion and under different conditions, to determine the degree to which we can expect the output of the model to reflect differences in complexity between societies.

We also tested our model for sensitivity to systematic increases in overdispersion. Using N = 20, K = 1, 2, 3, 4, 5 and D = 20, 40, 60, 80, 100, we ran simulations in which we defined a common overdispersion parameter ρ for all components. We used ρ = 0.005, 0.01, 0.015, 0.02, running 20 simulations for each combination of parameters. We examined our model for biases introduced by increased overdispersion by analysing the mean error in estimates of S and K in relationship to overdispersion, social structure and sampling.

Illustration using real data

We used two real datasets to illustrate the method. These analyses are illustrative only and are not necessarily optimal analyses of these data. Photoidentification data on 30 northern bottlenose whales (Hyperoodon ampullatus) were collected off Nova Scotia, Canada, between 1988 and 2003, as in Gowans et al. (2001) with some extra data from later years. Photoidentification data on 77 female sperm whales (Physeter macrocephalus) were collected off Dominica, West Indies, between 1984 and 2015, as in Gero et al. (2013a), again, with some extra data. In both studies, sampling periods were days, only individuals identified on more than 10 days were included, association of a dyad was defined as identified within 10 min on the same day, and association indices were calculated using the simple ratio index. For each dataset, we used the binomial mixture model together with the ICL criterion to estimate the number of relationship classes and the characteristics of each, as well as an estimate of association complexity (from Eq. (3)).

Computer code

This work was carried out in parallel and largely independently using the packages R (by MW) and MATLAB (by HW). Functions for using binomial mixture models on association data in both languages are given in the Supplemental material.

Results

Testing the method

As expected, most variation in S in our simulations was driven by differences in the number of relationship classes, as demonstrated by a high correlation between true values of S and K (r = 0.93, Fig. 2). However, when only considering cases in which K > 1 (as when K = 1, S is always 0), the correlation was much lower (r = 0.67), and a significant degree of overlap in values of S between different values of K was apparent (Fig. 2). While the number of relationship classes greatly affects the complexity of associations, the frequency and strength of relationship classes are also an important factor.

Fig. 2
figure 2

Distributions of realized complexity values (S) between societies with different numbers of relationship classes (K). Violin plots represent density estimates and quartiles of true S values for each value of K used. Simulation runs for K = 1 are not plotted as these runs, by definition, have S = 0. Blue points represent the maximum possible entropy for each value of K. Each distribution represents the results of 500 simulation runs

The results of our simulation study largely suggest that ICL is the best criterion to use for these models. The correlation between the estimates of S via ICL and true complexities across all parameters was 0.9, while AIC and BIC had overall correlations of 0.79 and 0.78, respectively. This high correlation for ICL across sampling efforts, network sizes, and social structures indicates that estimates of S based on models chosen via ICL are highly comparable between networks. At low sampling efforts (D < 40), ICL does give estimates of S less correlated with true complexities than AIC or BIC, but it rapidly tends towards a perfect correlation with increased sampling effort. In contrast, the correlations between true and estimated complexities obtained by AIC and BIC do not increase with sampling effort and are consistently below 0.9 (Fig. 3, left).

Fig. 3
figure 3

Correlation between real and estimated S (left) and mean error in estimates of S (right) for each criterion under different levels of sampling effort (expressed as mean denominator, D) and network sizes (in number of individuals, N). Each data point is based on 250 simulation runs (50 runs for each value of K). Dotted black line indicates a mean error of 0

AIC and BIC were both likely to overestimate the complexity of a social structure, and this overestimation was exacerbated by increased sampling effort. In contrast, the estimates obtained by ICL are downward biased at low sampling rates, but the bias decreases as sampling effort increases. This indicates that ICL estimates are unlikely to be overestimates of true complexity, but large amounts of data (D > 80) are likely needed to ensure accurate estimates. However, even at low sampling rates, the bias is less than 0.5 (Fig. 3, right).

In addition, both AIC and BIC provide estimates that are sensitive to network size in our simulations, with larger networks having added positive bias. In contrast, ICL did not give estimates biased by network size (Fig. 3) and, thus, provide an estimate of complexity that is comparable between social networks of different sizes and levels of completeness (a reasonable, roughly random subset of a larger network should provide a similar estimate as the full network).

ICL was prone to underestimating both S and K at low sampling rates. This tendency was exacerbated by social structures with more relationship classes. This bias was relieved with increased sampling effort. In addition, ICL rarely found multiple relationship classes in social structures in which there was only one class of dyad (Fig. 4). Therefore, while we suggest the use of ICL to choose the number of components in these models, as it gives good estimates that are comparable between networks, we caution that these estimates will likely be underestimated with low sampling intensity, particularly for complex social structures.

Fig. 4
figure 4

Relationship between input value of K and error in estimates of S and K obtained from models chosen via ICL. Colors indicate simulated sampling effort (as expressed by mean denominator of association indices, D). Results are presented based on runs with N = 20, and each data point represents the mean of 50 simulation runs. Dotted black line indicates a mean error of 0

All criteria were somewhat sensitive to systematic increases in overdispersion. High levels of overdispersion led to overestimates of complexity, particularly under high sampling intensity. However, ICL was far less sensitive to overdispersion than AIC or BIC. At values of ρ < 0.015, ICL converged towards zero bias as sampling effort increased towards D = 100, and even at ρ = 0.015, upward bias at high sampling intensity was small. At ρ = 0.02, upward bias at high sampling intensities became more pronounced (Fig. 5).

Fig. 5
figure 5

Results of overdispersion simulation. Values shown are mean error in estimates of S for all runs with a given overdispersion parameter. Colors indicate criteria used to estimate the number of components. Dotted black line indicates a mean error of 0

Illustration using real data

The distributions of simple ratio association indices for the northern bottlenose whale and sperm whale datasets are shown in Fig. 5. Mixture models suggested 2 relationship classes for the northern bottlenose whales with an association complexity of S = 0.69 and 3 relationship classes for the sperm whales with an association complexity of S = 0.91. The mean denominators of the association indices and estimates of overdispersion were D = 34.6 and ρ = 0.010 for the northern bottlenose whales and D = 59.9 and ρ = 0.007 for the sperm whales. Using the simulation data in Fig. 4, these suggest that our model estimates may have small (< 0.2) downward biases.

Figure 6 shows the estimated distribution of real association indices from the binomial mixture models and estimates of overdispersion. While they roughly match the distribution of measured association indices, the matching is not too good, but it is must be remembered that the measured association indices include sampling error while the estimated real association indices do not.

Fig. 6
figure 6

Distribution of measured association indices for northern bottlenose (above) and sperm (below) whales together with estimated relationship classes from binomial mixture models with ICL, with intra-class dispersion estimated using maximum likelihood

Both species have a preponderance of extremely low association relationships (μ1 = 0.017 and α1 = 0.88 for the northern bottlenose whales; μ1 = 0.002 and α1 = 0.90 for the sperm whales), as well as some low association relationships (μ2 = 0.125 and α2 = 0.12 for the northern bottlenose whales; μ2 = 0.072 and α2 = 0.07 for the sperm whales). The sperm whales additionally have a much smaller class of fairly strong association relationships (μ3 = 0.252 and α3 = 0.03). The latter correspond to relationships within social units (Gero et al. 2013a).

Discussion

We have presented a method for quantifying the complexity of association networks based on dyadic sighting histories. We use binomial mixture models to estimate the number of different classes of relationship and the association frequencies of each class and take the diversity of these frequencies as our measure of association complexity. Our results show that this approach can generally be used to effectively model the dyadic associations and measure network complexity and is comparable between networks.

Hinde (1976) defined social structure as the “nature, quality, and patterning of relationships”. Ideally, we would measure complexity from all three of these elements. However, it is well-known that measures of the global patterning of relationships—such as metrics from network analysis—are not comparable between networks, due to the dependency of these measures on network size and density (Faust 2006; Rito et al. 2010; van Wijk et al. 2010). This is a significant problem for the field of animal social networks because it makes the comparative approach difficult. Our method instead examines social complexity through the nature and quality of dyadic relationships—providing a bottom-up measure of complexity that can be fairly compared between association networks. Our method can therefore be used with a comparative approach to examine drivers of social complexity across populations, species and potentially taxa.

A previous approach to measuring dyadic complexity (Fischer et al. 2017) is a promising way forward for many systems, but it is not appropriate for association data, because it requires classes of interaction to be known and pre-defined in the complexity measure. The researcher needs data more detailed than just who was with whom (associations) and on whether an interaction is of the class aggressive or the class affiliative. Our approach instead seeks to automatically identify different classes of dyad based on the patterns of associations. The same limitations that apply to any analysis using association indices apply to our method. Since all that is being measured and modelled is the proportion of time individuals spend together, the nuances of social relationships are perhaps not captured by these measures. For example, our method would not be able to distinguish between two relationship classes that associate with the same probability but interact in different ways while associated. We suggest that our model will be a useful comparative tool when the collection of detailed interaction data is impractical, such as in studies of wild cetaceans.

Our complexity measure is unaffected by network size; since our measure is based on dyads, the association complexity of a reasonably well-sampled social network will be similar to that of the full network. Our measure is also fairly robust to the existence of individuals that are distantly connected to the network and thus observed infrequently. Although our method rarely estimates a higher level of complexity than that of the true network, low-intensity sampling biases it towards artificially low estimates of complexity. It is a common feature of social network analysis that low-intensity sampling produces metrics that are unreliable (Whitehead 2008; Franks et al. 2010; Farine and Whitehead 2015), and we, therefore, suggest that caution is taken when interpreting results from this model on sparsely sampled data.

Because the complexity measure is partly based on unevenness of dyadic weights, we might expect a network sampled with the gambit of the group to have a higher level of complexity than a network sampled by observing pairwise associations (e.g., by focal sampling). This is because there will be more casual acquaintances in the network as an artefact of the gambit sampling method. For example, both individuals A and B might only be observed together because they are both associating with individual C. Thus, when adopting a comparative approach, differences in sampling protocol will need to be considered.

Finally, the driver of association complexity needs to be considered for each social system, because complex social structures can arise through a number of mechanisms. Complex social structures, such as multilevel societies, can arise from cognitively demanding behavioural processes, such as cultural transmission (Cantor et al. 2015). However, complexity can also be driven by simple differences between individuals in their social behaviours (Firth et al. 2017). Furthermore, there is increasing recognition of the role that features of the physical environment play in shaping social structures (He et al. 2019, Topical collection on Social complexity). Therefore, it could be that the social decisions of individuals do not produce a complex network, but instead social complexity is driven by patterns of space use or the complexity of the environment (Titcomb et al. 2015; Leu et al. 2016). Complex patterns of overlapping space use could lead to higher estimates of social complexity with our method. It is therefore important that our proposed metric not be interpreted as a measure of the complexity of individuals’ social decision-making but rather as a feature of the social structure of the population.

If our measure of association complexity is to be widely used, it needs some measure of confidence. We suggest the temporal jackknife, in which different temporal segments of data are omitted in turn. This method is appropriate with behavioural association data when the nonparametric bootstrap cannot be used (as randomizing identities produces self-associations) (Whitehead 2008). Additionally, it would be helpful to give analytic estimates of the bias due to sampling rates and overdispersion that are indicated by our sensitivity analyses. There also could be more robust measures of association complexity from mixture model data that perform better than the Shannon index, but we have not yet found any.

The method that we have proposed could be varied or extended in several potentially productive ways. Using the same dataset, two or more measures of association could be defined, based on different behavioural states or ways of associating (e.g., Gero et al. 2005, 2013b). These, then, constitute multivariate relationship measures, which could be clustered using multivariate mixture models (McNicholas 2016). To obtain our univariate measure of association complexity, using Eqs. (2) and (3), we need some way of compounding the now vector-valued centroids of the clusters (μs), perhaps using principal components analysis. However, we could also calculate separate measures of complexity for each association measure, so that, for instance, complexity could be compared between behavioural states or modes of communication. Our association complexity measure(s) could also be used in parallel with other network or relationship measures, such as modularity (Newman 2006), to give a more nuanced comparison between social networks.

Many social network data are in the form of interaction rates (Farine and Whitehead 2015). Poisson mixture models would be appropriate in these cases, perhaps with offset variables indicating effort. These interaction rate data could be combined with each other, or with association data, in a multivariate mixture analysis. Offset variables may be useful more generally. For instance, generalized affiliation indices are the residuals from a regression of the measures of association or interaction on structural predictor variables, such as gregariousness or spatiotemporal overlap (Whitehead and James 2015). Inputting generalized affiliation indices into mixture models, either directly into Gaussian mixtures or as offsets in binomial or Poisson mixtures, could control for use of space and other confounds.

We have attached R and Matlab code for deriving association complexity using mixture models, and the method will also be incorporated in the next release of SOCPROG, a package for analysing animal social structures using individual identification data (Whitehead 2009).