Influence spreading model used to analyse social networks and detect sub-communities
Abstract
A dynamic influence spreading model is presented for computing network centrality and betweenness measures. Network topology, and possible directed connections and unequal weights of nodes and links, are essential features of the model. The same influence spreading model is used for community detection in social networks and for analysis of network structures. Weaker connections give rise to more sub-communities whereas stronger ties increase the cohesion of a community. The validity of the method is demonstrated with different social networks. Our model takes into account different paths between nodes in the network structure. The dependency of different paths having common links at the beginning of their paths makes the model more realistic compared to classical structural, simulation and random walk models. The influence of all nodes in a network has not been satisfactorily understood. Existing models may underestimate the spreading power of interconnected peripheral nodes as initiators of dynamic processes in social, biological and technical networks.
Keywords
Social networks Influence spreading Network dynamics Influence measure Network topology Community detection Community structure Closeness centrality Betweenness centralityBackground
Social influence measures have been developed by using for example local structural characteristics [1, 2] geodesic distances [3] and random walks [4]. Most of these measures don’t have exact quantitative interpretations for general network structures and variable sizes of networks. Structural measures take into account local degrees of nodes in the neighbourhood of a source node. Geodesic based measures use distances from a source node. Random walks consider different paths from a source node to a target node but the method still is unsuccessful in combining the contributions from alternative paths to generate an exact quantitative measure.
Models for the process by which influence or ideas propagate through a social network have been studied in a number of research articles for example in [5, 6, 7, 8, 9]. In a recent article [10] a review of theories for influencer identification in complex networks has been published. Many aspects should be considered when constructing measures for describing and comparing social networks. Several studies propose influence measures for identifying the most influential spreaders or mediators. Obviously when spreading processes are analysed the concept of time should have some kind of role in the model. Some models presented in the literature are static and don’t investigate processes evolving dynamically or don’t provide justification for how the models describe a steady state or limiting states of a network. Usually network structures are not calculated exactly random walk in a network is an example. One requirement for the theory is a quantitative model with natural interpretations of the variables. This guarantees that the numerical values obtained for all kinds of network topologies and different temporal spreading distributions can be compared with each other. A valid theory and an applicable model are needed to combine the spreading process evolving as a function of time and the structure of a network.
Computational difficulties must be solved in keeping track of various paths and their possible interdependencies. In large networks computing time may set practical constraints for calculations. One requirement for research of large social biological and technical networks is a scalable computing algorithm [11]. Good approximations can be achieved with limited path lengths as the rule of six degrees [12] is valid for many kinds of social networks. Limited path lengths can provide good results in community detection algorithms. In the literature many community detection algorithms take into account only local interactions [13].
Introduction
The aim of this paper is to provide answers to the requirements presented in the previous section. Possible models for describing the temporal spreading process are proposed. A method for modelling the topological structure of a network is presented. Probability theory is used for combining the spreading via all the possible paths from a source node to a target node. Possible dependencies between different paths are taken into account. With these building blocks various problems in social network analysis, and in many other fields of network science, can be solved [14, 15, 16].
We present specifications for the most important measures needed to investigate social networks. These are node level ego centric centrality and betweenness measures. Closeness centrality describes node’s power to spread influence to other nodes in the network. Betweenness is a measure of the influence of nodes in a network relative to the flow of information between others. Betweenness centrality tends to pick out nodes that play the role of brokers between communities. In addition, an overall network measure, expressed as a function of time that combines different properties of the network, is presented. After all, different measures for different purposes can be constructed. For example, the concept of betweenness can be understood in many ways which makes it impossible to define one absolute betweenness measure.
We demonstrate the method with a real social network documented in the literature and compare the results with the corresponding study published recently. The same network has been investigated in [3] where a comprehensive model suitable for local and global aspects of a social network has been presented. In [3], a model with an adjustable parameter for weighting neighbouring and distant nodes in the network has been used to determine measures for centrality and betweenness.
In many networks a community structure exists, in which network nodes are connected together in groups, between which there are less connections. A number of methods and algorithms have been proposed for detecting communities in social networks, for example published in [17, 18, 19, 20, 21, 22, 23]. The research has often been focused on developing different or more efficient algorithms and different implicit or explicit definitions of community [24, 25]. As different definitions of community exist, different algorithms are needed for discovering various kinds of communities.
Some of the algorithms for detecting communities in a network structure are minimum-cut method, hierarchical clustering, Girvan–Newman algorithm, modularity maximization, statistical inference [26], and clique-based methods. Descriptions of the methods can be found for example in [13, 24, 25, 27]. Many classical algorithms for partitioning network nodes into groups are based on matrix and linear algebra methods. Examples are analogues of the Kernighan–Lin algorithm [28] for maximizing modularity and an analogue of the spectral graph partitioning [29, 30] algorithms for community detection. A definition for modularity is the fraction of the edges that fall within the given group minus the expected fraction if edges were distributed at random. The Kernighan–Lin algorithm is based on repeatedly moving, starting from some initial division, the vertices that most increase or least decrease the modularity.
The Louvain method for community detection is a greedy modularity optimization method to extract communities from large networks [31]. For investigation of large-scale biological and social community structures an information theoretic approach has been presented in [32]. Probability flow of random walks on a network is used as a proxy for information flows. There are a number of other greedy or SDP-based (semi-definite programming) approaches for finding communities in large networks [33, 34].
A classification for community discovery methods in complex networks has been presented in [24]. Eight different community discovery methods have been described in the review: feature distance, internal density, bridge detection, diffusion, closeness, structure, link clustering and meta-clustering. Altogether 39 algorithms classified in these eight categories have been described in [24]. One of the methods is more relevant from our perspective: a diffusion community in a complex network is a set of nodes that are grouped together by the propagation of the same property, action or information in the network. In [24] a meta-procedure for detecting a diffusion community has been defined: Perform a diffusion or percolation procedure on the network following a particular set of transmission rules and then group together any nodes that end up in the same state. In this respect, a community can be defined as a set of target nodes influenced by a fixed set of source nodes. In the financial networks literature, a decaying influence model describing propagation of shocks on banking networks has been studied in [35].
Outline
The focus of this paper is to present a new influence spreading model and its applications with examples. Accordingly, the main content of this study is presented in “Theory of social influence measures”, “Applications of social influence measures” and “Numerical results and discussion” sections. In addition, the next section introduces classical definitions of closeness centrality and betweenness centrality as well a recent extension of the measures to consider both local and global network structure. Lastly, conclusions provide a short summary of the paper.
The theory is presented in several phases. First, information and influence propagation models are discussed. Next, the influence spreading measure between two nodes of a network is presented with the help of an example network of Dutch students’ social network [3, 36]. Then follow definitions of quantities and the general method of combining paths between two nodes of a network. Temporal Spreading of Influence is a sub-model describing time dependence of the spreading process. After this, a high-level algorithm is presented for computing the influence spreading matrix describing the spreading between all nodes of a network.
Applications of the theory of Social Influence Measures are based on the Social Influence Matrix. In this part of the study, definitions of closeness centrality, betweenness centrality and community detection measures are presented.
The model is demonstrated by presenting results for closeness centrality, betweenness centrality and analysis of community structures. Closeness centrality and betweenness centrality measures are illustrated with the Dutch students’ social network [3, 36]. Four different networks are used as examples for detecting communities and investigating network structures. As an introduction, an artificial network of the Game of Risk [37] is analysed. Then the 32 Dutch students’ social network [36] is investigated introducing more complex structures. Next, an animal social network of dolphins [38] is analysed along with some comments on similarities and differences with respect to human social networks. The scalable version of the algorithm [11] is used for computing the influence spreading matrix for a Facebook social network of 4039 users. The matrix is used as input information for the community detection algorithm.
Geodesic based centrality and betweenness measures
Several measures of centrality and betweenness have been proposed in the literature [1, 39]. Recently, geodesic based centrality and betweenness measures, unifying the local and the global network structure, have been presented [3].
Theory of social influence measures
Information and influence propagation models
Different propagation models for influence spreading can be defined depending on the phenomena we are studying. In the context of this paper, two main issues are important. Firstly, a model has to be decided for the time distribution that describes spreading of influence from one node to another. Secondly, propagation can proceed independently of states of mediating nodes, or propagation depends on the states of the nodes along the paths between a source and a target node.
In this paper we use Poisson distribution as the time distribution for propagation between nodes. In the model, it is easy to use any statistical distribution or empirical data instead of Poisson distribution. We have made experiments with a model based on Uniform distribution. This describes, for example, propagation of information via e-mails when users process their e-mails at uniformly distributed time points during a day (or other time unit). This distribution gives comparable results, but not exactly the same, because more spreading occurs at low time values when propagation obeys Uniform instead of Poisson distribution.
The second issue, when the spreading process depends on the states of the intermediate nodes, is more involved. Dependency on static node attributes is a minor addition to the model because the model takes into account nodes and links individually. Dynamic dependency on time dependent states of nodes can be compute-intensive because simulations or iterative algorithms probably are necessary to solve the problem. In this paper, only state independent propagation models are studied. Nodes mediate influence regardless of their own state and states of all the other nodes of the network.
A realistic model for information propagation may be a state dependent variant of the model where propagation events (attempts of influence) occur only for new information (or probability for new information is higher). In other words, information is mediated to neighbouring nodes only in cases when the node is unaware of the information before the propagation event. Nodes are less willing to propagate known information than new information.
- 1.
Probability of spreading influence depends on the states of nodes. Nodes along the paths between a source and a target typically are less eager to mediate information already known.
- 2.
In the state independent model, propagation occurs independently of nodes’ states. The probability to receive and forward influence is determined by the time dependent probability, node weighting factor and link weighting factor.
- 1.
In summary, the propagation model has the following characteristics: temporal distribution describes node’s delays between receiving an influence spreading event and forwarding the event to neighbouring nodes. Links between nodes have no delays.
- 2.
Node weighting factor \(w_{N = i}\) for node \(i\) describes node \(i\)’s activity, that is, the probability of forwarding an event of influence to neighbouring nodes. Similarly, link weighting factors \(w_{L = i,j}\) are additional factors needed in cases where the influence spreading between nodes \(i\) and \(j\) are not equal for all the directed links between nodes of the network.
- 3.
The spreading process is assumed to start from one node in the network at time \(T = 0\).
Influence spreading measure between two nodes
Example network
In this section, we illustrate mathematical methods of modelling influence spreading measures \(C_{s,t} \left( T \right)\) between a source node \(s\) and a target node \(t\) in a small social network at time \(T\). Based on these results, new measures of centrality, betweenness and community detection are defined.
Our method takes into account all the possible self-avoiding paths in the network. The generalization including paths allowing nodes to appear several times in a path is possible and easy to compute [11]. However, then we must have a limit for path lengths or for the number of possible occurrences on the path. This method requires less computer memory because the list paths need not to be saved in computer memory. For large networks the number of different paths is high and saving memory is important. Self-avoiding paths are suitable for the purposes of presenting the method. Later in this paper, results for a larger social network of 4039 Facebook users will be provided where influence propagation via paths with loops is considered.
The 14 paths from Node 1 to Node 4 of the network in Fig. 1
# | Nodes in a path | ||||||||
---|---|---|---|---|---|---|---|---|---|
1 | 1 | 3 | 15 | 6 | 27 | 4 | |||
2 | 1 | 3 | 15 | 6 | 27 | 20 | 4 | ||
3 | 1 | 3 | 15 | 6 | 27 | 20 | 30 | 4 | |
4 | 1 | 3 | 15 | 6 | 27 | 20 | 30 | 13 | 4 |
5 | 1 | 3 | 15 | 6 | 27 | 30 | 4 | ||
6 | 1 | 3 | 15 | 6 | 27 | 30 | 13 | 4 | |
7 | 1 | 3 | 15 | 6 | 27 | 30 | 20 | 4 | |
8 | 1 | 3 | 27 | 4 | |||||
9 | 1 | 3 | 27 | 20 | 4 | ||||
10 | 1 | 3 | 27 | 20 | 30 | 4 | |||
11 | 1 | 3 | 27 | 20 | 30 | 13 | 4 | ||
12 | 1 | 3 | 27 | 30 | 4 | ||||
13 | 1 | 3 | 27 | 30 | 13 | 4 | |||
14 | 1 | 3 | 27 | 30 | 20 | 4 |
To be precise, we present below in Eq. (4) all the steps of computing the probability of influence spreading from Node 1 to Node 4. We denote intermediate steps by subscripts in parentheses. For example, in the first step, \(P_{\left( 1 \right)}\) is the combined result of paths 1-3-15-6-27-20-30-4 and 1-3-15-6-27-20-30-13-4 with path lengths 7 and 8 and common path length 6. The steps of the computing algorithm are shown in parenthesis in Fig. 2. The last step in Eq. (4) denoted by \(G_{1,4,\left( 1 \right)}\) gives the final result of the influence of Node 1 on Node 4.
In the example of computing the influence of Node 1 on Node 4 all the paths go through Node 3 (link 1–3). If we consider the influence of Node 3 on Node 4, we observe from Fig. 1 that two independent possibilities occur, via links 3–27 and 3–15–6–27. In this particular case, these two contributions are denoted by \(G_{3,4,\left( 1 \right)}\) and \(G_{3,4,\left( 2 \right)}\). In the following, we denote the number of possible independent contributions by \({\mathcal{I}}\).
Combining paths between two nodes
Computing \(G_{n,j,\left( x \right)} \left( {w,T} \right)\) requires searching all the different paths from Node \(n\) to Node \(j\) with path lengths less than an upper limit \({L_\text{max} }\). Parameter \(L_{ \text{max} }\) is the maximum path length and it is used to restrict the number of paths and computing time in large networks. Searching the paths is a straightforward task by using the network topology information by following links between the source node and target nodes. The computation is conducted simultaneously from one source node to all the nodes in the network. The algorithm for computing \(G_{n,j,\left( x \right)} \left( {w,T} \right)\) handles the paths in the descending order of the number of common links at their beginning among the set of paths from Node \(n\) to Node \(j\). A simple method would be first to list all the paths and then compute the influence spreading matrix \(C_{n,j} \left( {w,T} \right), \quad n,j = 1, \ldots ,N\).
The most time consuming task, when computing self-avoiding paths, is keeping track of nodes and rejecting paths where a node appears more than once. This is the reason why the algorithm relaxing the condition of self-avoidance and allowing loops has significantly lower computer running times, essential for large social networks [11].
Weighting factors describe probability of propagating information and opinions. We call this the activity of nodes (or links). Opinion changes in social networks are uncommon when the new ideas are unfamiliar to members of a social network. (To be precise, we should make a difference between influence spreading and opinion spreading. These concepts are related but usually different parameters are needed. Even a different spreading model may be needed, if probability to change opinion is conditional on information or influence spreading events). Technical and biological networks have similar commonalities. Spreading of a computer virus or a biological virus between nodes can have a low probability because of virus protection, vaccination or characteristics of the virus itself.
At the beginning of this section, Eq. (5) describes non-mutually exclusive events in basic probability theory. It serves as an introduction between commonly known methods of probability and the method of this study for combining probabilities of influence spreading via different paths in a network. In fact, Eq. (5) is the last step in the algorithm with \(L = 0\) and \(P_{L} \left( T \right) = 1\) in Eq. (7). As a consequence, we could have omitted Eq. (5) because it can be regarded as the last step of the general algorithm.
Temporal spreading distribution
Here, the interpretation is that the spreading has advanced \(L\) or more links in the network at time \(T\). Equation (8) takes into account nodes’ delays between receiving an influence spreading event and forwarding the event to neighbouring nodes. When time approaches infinity, nodes’ probability of spreading influence approaches one. In Eq. (8), the number of spreading events is denoted by stochastic variable \(K\left( T \right)\). The intensity parameter of Poisson distribution is denoted by \(\lambda .\) The statistical distribution and its parameters determine the spreading rate in the network. The Poisson distribution is not the only possibility, for example, a model based on Uniform distribution may better describe some other temporal spreading behaviour.
Parameter \(\lambda\) can be estimated from empirical influence propagation data. In most cases, this kind of time dependent information is not available. If empirical data are not available, the intensity parameter could be evaluated by comparing with analyses of other networks with comparable level of development. Values of \(\lambda\) and time \(T\) are related in Eq. (8), and also the quantity \(\lambda T\) can be estimated. It describes the maturity level of propagation on the network. In practice, evaluating the model parameters is not simple because nodes may have individual characteristics. But if these kind of empirical data are available, the model can be used with different parameters for each node and link of the network. In addition, the stochastic distribution of Eq. (8) can be replaced by an empirical distribution.
Algorithm for computing the influence spreading matrix
An algorithm for computing the values of bidirectional influence measures [11] between all the pairs of nodes in a network is presented. These values make up a \(N \times N\) dimensional influence spreading matrix, were \(N\) is the number of nodes in the network. The matrix is computed for discrete spreading time values of interest. Closeness centrality, betweenness centrality and community detection measure are defined with the help of the matrix elements. The algorithm for computing the influence spreading matrix \(C_{s,t} \left( T \right), \quad s,t = 1, \ldots ,N\) is described below. Comments in the algorithm below are denoted by ‘/* */’.
The applicability of shortest-path-based centralities is limited by the high computational complexity of calculating the shortest paths between all pairs of nodes [10]. A generalization of closeness centrality [40] considers all paths in the network and assigns a larger weight to shorter paths using a tuneable parameter. The method presented in this paper considers all the paths in a network with an additional feature of modelling common links of the paths at the beginning of their routes from a source node to target nodes.
Most of the results of this paper are for self-avoiding paths. A self-avoiding path is a sequence of moves on a path that does not visit the same node more than once. The networks of this paper are selected for illustrative purposes and therefore are small when compared with many modern applications and interests of research.
Computing times of the fast algorithm [11] for large social media networks
Social media | Nodes | Links | Computing time |
---|---|---|---|
| 4039 | 88,234 | 1 min |
| 81,306 | 1,768,149 | 5 h |
Google+ | 107,614 | 13,673,453 | 4 days |
The algorithm, especially suitable for influence spreading modelling of social networks, allows returns to nodes during the spreading process. The design of the algorithm is based on this property. In fact, disallowing loops would make the algorithm in [11] slower. The value of maximum path length \(L_{ \text{max} }\) can be set to 20 because higher terms are negligible for typical temporal spreading distributions. Lower values are used if they describe the real-world phenomenon by limiting the spreading process to shorter path lengths.
Applications of social influence measures
Definition of closeness centrality measures
In the following we use both normalized and un-normalized versions of centrality and betweenness measures. Normalized measures are divided by the number of nodes \(N\) of the network. These measures have a natural interpretation, normalized measures are probabilities and un-normalized measures are expressed in the units of number of nodes.
Definition of betweenness centrality measures
In the definition of Eqs. (9–12) normalization is a question. We have decided to include source nodes with the value of 1.0 in the formulas and as a result of that N is used as a normalization factor. The source node is assumed to be the initiator of influence spreading with probability 1.0.
Definition and algorithm for computing a community detection measure
Similarly, the classical Kernighan–Lin algorithm [28] is based on moving nodes between two factions of a network. However, Kernighan–Lin algorithm searches a community of pre-determined size and provides no sub-structures. In addition, the model of this paper calculates influence between all the nodes of the network as a function of time. Instead, the Kernighan–Lin algorithm is based on modularity maximization of the community and local topology of the network when determining which nodes to exchange between the two factions. Other community detection methods have been reviewed in [13], where strengths and weaknesses of modern methods are pointed out, and directions given to their use.
Typically, social networks with weak interactions between nodes, or social networks in their early development phases, have several local maxima with different compositions. These factions can overlap with each other. In many cases, unions and intersections of the divisions are also local maxima of Eq. (15) with some parameters of the model. If a union or intersection is not identified as a local maximum, these sets of nodes could still be considered as possible sub-groups of the network. In dynamic community building processes sets of nodes divided by different community boundaries may be left as outsiders. This is more probable if the measure of Eq. (15) has a low value or several divisions have almost equal numerical values.
Computing the community detection measure of Eq. (15) can be time consuming for large networks. This is a cost of considering influence spreading globally in the network. Several methods can be used to optimize the algorithm. First, limiting the computation to local nodes is an obvious alternative. Further, if a limited sub-set of the network is of interest, approximations can be computed by considering only the selected sub-set and some neighbouring nodes and structures around it.
The method for community detection consists of two independent main algorithms. The first algorithm is optimized for describing social influence spreading. The scalable version of the algorithm [11] allows loops in the process of influence spreading. The second algorithm uses results of the first algorithm. The input for the second algorithm is \(N \times N\) matrix \(C_{s,t}\) at time \(T\), and control variables, if the analysis is limited to a specified portion of the network. This is relevant when very large social networks are investigated or a particular set of members of the social network are under investigation. Because the first algorithm is able to deal with large networks up to 100,000 nodes, matrix \(C_{s,t}\) is usually computed for the entire network.
- 1.
Compute the influence matrix \(C_{s,t} ,\; s,t = 1, \ldots ,N\). Closeness and betweenness centrality measures are results of this step.
- 2.
Compute the list of communities and sub-communities. Communities and their nested and overlapping structures are analysed.
- 1.
Randomize values of vector \(V\) of \(N\) elements. Vector \(N\) has elements of zeros and ones.
- 2.
Use \(V\left( n \right),\; n = 1, \ldots ,N\) as the initial state of the network. If \(V\left( n \right)\) is one, node \(n\) belongs to the first faction of the division, and if \(V\left( n \right)\) is zero \(n\) belongs to the complement of the faction.
- 3.
Compute the community detection measure \(P\) of Eq. (15). Denote the value of the initial state by \(P_{0}\).
- 4.
Starting from Node 1 move nodes from one faction to the other. Denote the value of \(P\) by \(P_{i}\) for ith move.
- 5.
If the value of \(P_{i}\) is higher than \(P_{i - 1}\) move the node to the other faction, in other cases don’t move the node.
- 6.
After all the nodes of the network have been computed, start from Step 4 again.
- 7.
Repeat Step 6 while the value of \(P\) is increasing else a local maximum has been found.
- 8.
Repeat Steps 1–7 until a desired number of local maxima, or no new compositions, are found.
- 9.
Analyse the list of detected communities. The list has the following information for every detected community: the value of \(P\), sizes of the two communities, and the list of nodes for the detected communities. Nested and overlapping structures are discovered from the list of nodes.
A method to optimize the algorithm is to compute the list of communities in two phases. After detecting a desired number of communities with the basic algorithm, nested community structures are considered. In the second phase, in Step 2, the algorithm uses interceptions \(C_{i} - C_{j}\) of detected communities \(C_{i}\) and their detected sub-communities \(C_{j}\), where \(C_{j} \in C_{i}\). The intersections are often sub-communities or they are close to a composition of a sub-community. This makes computing times shorter because Steps 6 and 7 are less iterated.
Secondary effects between the two factions are included when computing the individual influence measure of Eq. (6). A variant of the model, would compute the two factions separately. This may better describe situations of the original social network splitting into two independent networks. The model presented in this paper is proposed for studying existing sub-communities and structures of a social network where interactions between sub-communities are continuous.
Numerical results and discussion
Numerical results for the centrality measure of Eq. (9) and the betweenness measures of Eqs. (12–14) are compared with the results of [3]. The betweenness measures of Eqs. (12–14) are defined with the help of removing one node from a network. This ensures that the closeness centrality and betweenness measures are consistent with each other. The method of community detection measure is also based on the same formulation. Results of analysing community structures of four different networks are presented after the results for closeness centrality and betweenness centrality.
Closeness centrality
Computing times of the community detection algorithm for searching 100 communities or sub-communities
Social media | Nodes | Links | Computing time |
---|---|---|---|
| 4039 | 88,234 | 1 min |
Enron e-mail | 36,692 | 183,831 | 6 h |
Results C_{n}(w_{N} = 1.0, T) from Eq. (9)
n | w_{N} = 1.0 | |||||
---|---|---|---|---|---|---|
C_{n}, T = 1 | C_{n}, T = 3 | C_{n}, T = 5 | Ranking, T = 1 | Ranking, T = 3 | Ranking, T = 5 | |
1 | 0.173 | 0.677 | 0.872 | 3 | 3 | 8 |
2 | 0.075 | 0.378 | 0.756 | 27 | 8 | 17 |
3 | 0.317 | 0.810 | 0.896 | 24 | 17 | 22 |
4 | 0.238 | 0.744 | 0.888 | 10 | 22 | 31 |
5 | 0.031 | 0.031 | 0.031 | 7 | 31 | 24 |
6 | 0.204 | 0.700 | 0.880 | 4 | 24 | 23 |
7 | 0.244 | 0.759 | 0.895 | 30 | 23 | 10 |
8 | 0.214 | 0.807 | 0.906 | 20 | 7 | 3 |
9 | 0.089 | 0.473 | 0.800 | 15 | 10 | 7 |
10 | 0.264 | 0.751 | 0.896 | 8 | 4 | 4 |
11 | 0.089 | 0.473 | 0.800 | 17 | 30 | 30 |
12 | 0.031 | 0.031 | 0.031 | 22 | 27 | 20 |
13 | 0.168 | 0.678 | 0.869 | 31 | 20 | 15 |
14 | 0.140 | 0.590 | 0.868 | 28 | 15 | 27 |
15 | 0.215 | 0.736 | 0.886 | 23 | 28 | 28 |
16 | 0.132 | 0.576 | 0.861 | 6 | 6 | 6 |
17 | 0.214 | 0.807 | 0.906 | 19 | 19 | 19 |
18 | 0.031 | 0.031 | 0.031 | 29 | 29 | 29 |
19 | 0.191 | 0.678 | 0.877 | 1 | 13 | 1 |
20 | 0.230 | 0.737 | 0.888 | 13 | 1 | 13 |
21 | 0.095 | 0.402 | 0.769 | 14 | 14 | 14 |
22 | 0.214 | 0.807 | 0.906 | 16 | 16 | 16 |
23 | 0.208 | 0.782 | 0.896 | 25 | 25 | 25 |
24 | 0.287 | 0.804 | 0.897 | 32 | 32 | 32 |
25 | 0.103 | 0.548 | 0.835 | 21 | 9 | 9 |
26 | 0.065 | 0.249 | 0.586 | 9 | 11 | 11 |
27 | 0.315 | 0.744 | 0.886 | 11 | 21 | 21 |
28 | 0.211 | 0.707 | 0.884 | 2 | 2 | 2 |
29 | 0.191 | 0.678 | 0.877 | 26 | 26 | 26 |
30 | 0.238 | 0.744 | 0.888 | 5 | 5 | 5 |
31 | 0.214 | 0.807 | 0.906 | 12 | 12 | 12 |
32 | 0.103 | 0.548 | 0.835 | 18 | 18 | 18 |
When \(w_{N} = 1.0\) and time \(T = 1\) Nodes 3, 27 and 24 are the most central. Soon after, Node 24 is more central than Node 27. After time \(T = 3\) Nodes {8, 17, 22, 31} are the most central nodes. These four nodes are in symmetrical positions in the network (indicated by the curly brackets) and have equal centrality values. These examples show that the most central nodes may change during the influence spreading process. This is a consequence of the network structure. At an early phase of the process Node 27 is more central because the spreading has just started and direct connections from the source node are emphasized. Node 27 has a high degree value of 8. Later Node 24 is more important in a central position between far away parts of the network even if its degree is only 4. The ranking of Node 27 is falling rapidly. Node 10 has similar changes but later at time \(T = 5\) it is rising again because of the highly connected group of four Nodes {8, 17, 22, 31}.
We may be interested also about the least central nodes. Obviously, the isolate Nodes 5, 12 and 18 are the least central. Nodes 26 and 2 follow in this order. Next Nodes {9, 11} and 21 at time \(T = 1\) and Nodes 21 and {9, 11} at times \(T = 3, 5\) follow. Again network topology has its implications: Nodes 9 and 11 get benefits from their better connectivity at later development phases of the influence spreading. Curly brackets indicate that Nodes 9 and 11 are at symmetrical positions in the network structure.
Results C_{n}(w_{N} = 0.5, T) from Eq. (9)
n | w_{N} = 0.5 | |||||
---|---|---|---|---|---|---|
C_{n}, T = 1 | C_{n}, T = 3 | C_{n}, T = 10 | Ranking, T = 1 | Ranking, T = 3 | Ranking, T = 10 | |
1 | 0.074 | 0.151 | 0.227 | 27 | 27 | 8 |
2 | 0.045 | 0.070 | 0.115 | 10 | 3 | 17 |
3 | 0.121 | 0.267 | 0.359 | 3 | 24 | 22 |
4 | 0.101 | 0.221 | 0.369 | 24 | 10 | 31 |
5 | 0.031 | 0.031 | 0.031 | 4 | 4 | 4 |
6 | 0.078 | 0.177 | 0.270 | 30 | 30 | 30 |
7 | 0.098 | 0.211 | 0.311 | 8 | 20 | 3 |
8 | 0.100 | 0.211 | 0.426 | 17 | 7 | 20 |
9 | 0.047 | 0.082 | 0.136 | 22 | 8 | 27 |
10 | 0.125 | 0.238 | 0.336 | 31 | 17 | 24 |
11 | 0.047 | 0.082 | 0.136 | 7 | 22 | 10 |
12 | 0.031 | 0.031 | 0.031 | 20 | 31 | 7 |
13 | 0.069 | 0.160 | 0.306 | 15 | 28 | 28 |
14 | 0.063 | 0.124 | 0.194 | 28 | 23 | 13 |
15 | 0.088 | 0.188 | 0.284 | 6 | 15 | 23 |
16 | 0.061 | 0.118 | 0.188 | 19 | 6 | 15 |
17 | 0.100 | 0.211 | 0.426 | 29 | 19 | 19 |
18 | 0.031 | 0.031 | 0.031 | 1 | 29 | 29 |
19 | 0.076 | 0.172 | 0.280 | 23 | 13 | 6 |
20 | 0.093 | 0.212 | 0.356 | 13 | 1 | 1 |
21 | 0.054 | 0.085 | 0.130 | 14 | 14 | 14 |
22 | 0.100 | 0.211 | 0.426 | 16 | 16 | 16 |
23 | 0.074 | 0.190 | 0.301 | 21 | 25 | 25 |
24 | 0.107 | 0.249 | 0.347 | 25 | 32 | 32 |
25 | 0.050 | 0.096 | 0.164 | 32 | 21 | 9 |
26 | 0.044 | 0.059 | 0.086 | 9 | 9 | 11 |
27 | 0.145 | 0.271 | 0.354 | 11 | 11 | 21 |
28 | 0.085 | 0.192 | 0.311 | 2 | 2 | 2 |
29 | 0.076 | 0.172 | 0.279 | 26 | 26 | 26 |
30 | 0.101 | 0.221 | 0.369 | 5 | 5 | 5 |
31 | 0.100 | 0.211 | 0.426 | 12 | 12 | 12 |
32 | 0.050 | 0.096 | 0.096 | 18 | 18 | 18 |
n | δ = 5 | δ = 1 | δ = 0.5 | Ranking, δ = 5 | Ranking, δ = 1 | Ranking, δ = 0.5 |
---|---|---|---|---|---|---|
1 | 0.103 | 0.354 | 0.548 | 27 | 27 | 3 |
2 | 0.034 | 0.227 | 0.439 | 10 | 3 | 27 |
3 | 0.175 | 0.469 | 0.635 | 3 | 24 | 24 |
4 | 0.135 | 0.355 | 0.539 | 24 | 10 | 10 |
5 | 0.000 | 0.000 | 0.000 | 7 | 7 | 7 |
6 | 0.074 | 0.339 | 0.534 | 4 | 15 | 15 |
7 | 0.138 | 0.396 | 0.578 | 15 | 23 | 23 |
8 | 0.132 | 0.327 | 0.516 | 30 | 4 | 1 |
9 | 0.035 | 0.257 | 0.469 | 8 | 30 | 4 |
10 | 0.199 | 0.415 | 0.586 | 17 | 1 | 30 |
11 | 0.035 | 0.257 | 0.469 | 22 | 20 | 6 |
12 | 0.000 | 0.000 | 0.000 | 31 | 6 | 20 |
13 | 0.067 | 0.267 | 0.469 | 20 | 28 | 28 |
14 | 0.07 | 0.305 | 0.506 | 1 | 8 | 8 |
15 | 0.135 | 0.376 | 0.561 | 28 | 17 | 17 |
16 | 0.069 | 0.297 | 0.500 | 6 | 22 | 22 |
17 | 0.132 | 0.327 | 0.516 | 19 | 31 | 31 |
18 | 0.000 | 0.000 | 0.000 | 23 | 19 | 19 |
19 | 0.072 | 0.317 | 0.516 | 29 | 29 | 29 |
20 | 0.104 | 0.339 | 0.530 | 14 | 14 | 14 |
21 | 0.066 | 0.253 | 0.456 | 16 | 16 | 16 |
22 | 0.132 | 0.327 | 0.516 | 13 | 13 | 25 |
23 | 0.072 | 0.359 | 0.558 | 21 | 25 | 32 |
24 | 0.140 | 0.430 | 0.609 | 25 | 32 | 13 |
25 | 0.036 | 0.265 | 0.476 | 32 | 9 | 9 |
26 | 0.034 | 0.199 | 0.408 | 9 | 11 | 11 |
27 | 0.264 | 0.470 | 0.623 | 11 | 21 | 21 |
28 | 0.103 | 0.334 | 0.526 | 2 | 2 | 2 |
29 | 0.072 | 0.317 | 0.516 | 26 | 26 | 26 |
30 | 0.135 | 0.355 | 0.539 | 5 | 5 | 5 |
31 | 0.132 | 0.327 | 0.516 | 12 | 12 | 12 |
32 | 0.036 | 0.265 | 0.476 | 18 | 18 | 18 |
To compare the results we try to find corresponding columns from the tables. This is not exactly unambiguous because the functional relationship between \(\delta\) in Eq. (2) and \(T\) in Eq. (9) is not known. Probably, no exact functional form exists because the structure of a network can produce complex effects on the functional form. We provide an example how the results can be compared. The results are remarkably similar when the rankings of the most central nodes are compared. However, there are some distinctive differences.
Because development phase of the social network is not known, it is not possible to determine the time value \(T\). We could examine all the possible time values \(T\) and compare with the results from Eq. (2) with all the possible values of \(\delta\). On the other hand, Eq. (2) is not describing dynamic development of the spreading process. The model of this paper is dynamic and the model of [3] is static. As a consequence, full analysis is not necessary. Instead we give an example that illuminates some similarities and differences of the results. For comparison, we choose one value of \(\delta\). Then we search from Tables 3 and 4 time values (columns) that provide roughly the same rankings of the most central nodes and conclude that these time values correspond to the results from Eq. (2) with the value of \(\delta\).
Also the parameter value of \(\delta\) and the time value \(T\) are related: high values of \(\delta\) correspond low values of \(T\). This can be seen when comparing column \(T = 1\) in Table 5 with column \(\delta = 5\) in Table 6. Both have the same most influential Nodes {27, 10, 3, 24}. The same comment as above concerning Nodes {8, 17, 22, 31} holds also for \(\delta = 5.\)
Betweenness centrality
Betweenness measures node’s role as a broker between others. In Eq. (12), we have presented a new betweenness measure with the help of removing one node from the network. An alternative presentation in Eq. (14) is normalized by the value of Eq. (11) describing network structure where all the nodes are present.
Results for community detection
Structure of the territory network of the game of risk
The game of Risk has also been used in the literature [37] as one of test networks for community discovery algorithms. The network is neither a human nor an animal social network that is why real life interpretation for model parameters may not be valid. This network is an example of analysing network structures of general networks, not just social networks and communities. On the other hand, this artificial network turns out to be the simplest of our four example networks. Investigating the social network of 32 Dutch students is analysed after presenting basic ideas with the help of the Game of Risk network structures.
The board game is played on a political map consistently of six continents which further divide into 43 territories. The territories are connected by boundaries and waterways. The goal of the game is to conquer as much land as possible.
The 17 different divisions of the 42 territories of Risk game detected as local maxima of Eq. (15)
North America | Europe | Asia | |||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
R | # | # | Equation (15) | Nodes | |||||||||||||||||||||||||||
1 | 38 | 4 | 128.0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 |
2 | 4 | 38 | 124.4 | ||||||||||||||||||||||||||||
3 | 33 | 9 | 122.8 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | |||||||||
4 | 34 | 8 | 122.3 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 |
5 | 13 | 29 | 121.1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |||||||||||||||||||
6 | 29 | 13 | 120.6 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | |||||||||
7 | 25 | 17 | 119.0 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | |||||||||
8 | 6 | 36 | 118.2 | 18 | 19 | 20 | 21 | 24 | 25 | ||||||||||||||||||||||
9 | 10 | 32 | 116.3 | 18 | 19 | 20 | 21 | 24 | 25 | ||||||||||||||||||||||
10 | 15 | 27 | 116.1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 18 | 19 | 20 | 21 | 24 | 25 | |||||||||||||
11 | 19 | 23 | 114.2 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 18 | 19 | 20 | 21 | 24 | 25 | |||||||||||||
12 | 15 | 27 | 114.1 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 27 | 28 | |||||||||||||||||
13 | 23 | 19 | 112.5 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 18 | 19 | 20 | 21 | 24 | 25 | |||||||||||||
14 | 32 | 10 | 112.4 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 22 | 23 | 26 | 27 | 28 | ||||||
15 | 18 | 24 | 112.2 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 26 | ||||||||||||||||||||
16 | 11 | 31 | 112.1 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 27 | 28 | |||||||||||||||||
17 | 14 | 28 | 110.6 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 26 |
South America | Africa | Australia | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
R | # | # | Equation (15) | Nodes | |||||||||||||
1 | 38 | 4 | 128.0 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | ||||
2 | 4 | 38 | 124.4 | 29 | 30 | 31 | 32 | ||||||||||
3 | 33 | 9 | 122.8 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 |
4 | 34 | 8 | 122.3 | 33 | 34 | 35 | 36 | 37 | 38 | ||||||||
5 | 13 | 29 | 121.1 | 29 | 30 | 31 | 32 | ||||||||||
6 | 29 | 13 | 120.6 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | ||||
7 | 25 | 17 | 119.0 | 33 | 34 | 35 | 36 | 37 | 38 | ||||||||
8 | 6 | 36 | 118.2 | ||||||||||||||
9 | 10 | 32 | 116.3 | 39 | 40 | 41 | 42 | ||||||||||
10 | 15 | 27 | 116.1 | ||||||||||||||
11 | 19 | 23 | 114.2 | 39 | 40 | 41 | 42 | ||||||||||
12 | 15 | 27 | 114.1 | 39 | 40 | 41 | 42 | ||||||||||
13 | 23 | 19 | 112.5 | 29 | 30 | 31 | 32 | 39 | 40 | 41 | 42 | ||||||
14 | 32 | 10 | 112.4 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | ||||
15 | 18 | 24 | 112.2 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | ||||
16 | 11 | 31 | 112.1 | ||||||||||||||
17 | 14 | 28 | 110.6 | 33 | 34 | 35 | 36 | 37 | 38 |
The first five lines in Table 8 show clearly continents Australia, South America and North America. On lines 1–7 Europe, Africa and Asia are joined together. Not until on line 12 Asia without territory 26, is identified as a division. In addition sub-communities of five {17, 22, 23, 27, 28} and six {18, 19, 20, 21, 24, 25} territories are identified within Asia.
The algorithm, with the parameters used, does not discover Europe and Africa as individual divisions. Node 26 is incorrectly identified to the combined coalition of Europe and Africa. Classifications in the literature have been referenced in [41] where three out of five algorithms, FastQ, LPA and PPC, also misclassify Node 26. Also three algorithms LPA, Infohiermap and PPC extract the same sub-communities of five and six nodes in Asia. Two algorithms, Infohiermap and the active semi-supervised algorithm of [41], identify Europe and Africa and their territories correctly.
We summarize the results of analysing the artificial Risk game network with the proposed model. Characteristics of human social networks are not assumed to be valid but we can compare the results with other algorithms in the literature. The results are similar, with an exception of two factions of the network identified as one group. Few other algorithms detect the correct nodes of the two communities although they detect the communities themselves [41]. Some of the algorithms in the literature use supervised or assisted methods which can lead to more accurate results.
In analysing Risk game network we use a tabular form of representing different community structures in the network. This is an illustrative and useful way of detecting communities and sub-communities in a social network. The role of numerical values of the community detection measure of Eq. (15) is highlighted by examining lines of the table in the order of numerical values of the measure.
Community structures of the 32 Dutch students’ social network
As the second case for analysing community structures, we use the longitudinal friendship network among 32 Dutch students on the fourth wave of the collected data [36]. Two students are considered to be friends if either or both of them named the other as a friend. A graphical representation of the friendship network is shown in Fig. 1.
The social network is analysed with two different model parameters \(w_{N} = 0.5\) and \(w_{N} = 1.0\) describing strength of the friendship relations. In both cases the parameter values \(\lambda = 0.5, T = 1.0, w_{L} = 1.0\) and \({L_ \text{max} } = 6\) are used. These parameter values are used for all connections in the network.
Different divisions into two factions of the social network of Fig. 1
For weaker connections of \(w_{N} = 0.5\) (or for low values of spreading time \(T\)) the highest value of the local maximum value is \(M = 22.3\) for the two factions {1, 9, 11} and {2, 3,…, 8, 10, 12,…32}. This division is not a local maximum for stronger connections of \(w_{N} = 1.0\) at the same time \(T = 1.0\). These kinds of weakly connected small sub-groups can exist at an early development phase of friendship relations.
A larger division B, on the left side of Fig. 1, into 11 and 18 nodes can be discovered for both weak \(w_{N} = 0.5\) and strong \(w_{N} = 1.0\) connections. This is also the case for the tightly connected sub-group {8, 10, 17, 22, 31} of division C. The value of the community detection measure for division D is almost as high as for C, even though division O with the four additional nodes \(\left( {O = D \cup \{ 3,15,25,32} \right)\) has a slightly higher value for strongly connected \(w_{N} = 1.0\) network. Almost similar to division O division E \(\left( {E = D \cup \{ 15,25,32} \right)\) can be found for weak connections but not for strong connections.
Node 3 is a gateway node with high betweenness values in Figs. 5, 6 and 8. Node 3 is a member of sub-groups in exceptional sub-groups of divisions L and Q. The sub-group of nodes {2, 3, 7, 14, 15, 16, 21, 23, 24, 25, 26, 32} in division L rules out three separate factions A, C, and D. Other examples of unconnected factions can be found in Table 9 in divisions F, G, I, J, K, M, N, and P. In these cases a strongly connected sub-group disconnects the second faction of the division. In this way more than two separate sub-groups can build up as a result of dynamic behaviour of social networks.
In a typical situation sub-groups are nested, for example, \(A \subset F,G,N,Q\), \(C \subset G,I,J,K,M,N,P\) and \(D \subset E,F,I,J,O,P.\) Often, sub-groups are unions, for example, \(F = A \cup D\), \(G = A \cup C\), and \(N = A \cup C \cup H\). In many cases, differences of combined sub-groups are stand-alone sub-groups. However, for example, {15, 25, 32} and {2, 7, 14, 16, 21, 26} are not separate sub-groups in any divisions. This can be sensitive to model parameters.
Community structures of a social network of dolphins
The third dataset we have selected to test the proposed community structure algorithm is the data of dolphin association collected for a research programme [38] of a community of 62 bottlenose dolphins over a period of 7 years. The network describing interactions of dolphins represents one of the real-world networks for which the community structure is already known. The social network has been analysed in [42] with a method published in [43, 44]. Two communities and four sub-communities were detected in the dolphin network. A temporary disappearance of the dolphin denoted by SN100 led to the fission of the dolphin community into two factions.
The animal social network is found to be similar to a human social network in some respects but different in some others such as the level of assortative mixing by degree within the population. Assortative mixing by age but not by vertex is observed in the dolphin social network [42]. Assortative mixing is a bias in favour of connections between network nodes with similar characteristics in complex networks [45]. In the model of this paper this may favour a lower value of maximal path length for the dolphin social network than for human social networks.
Optimal divisions of the dolphin social network
Division | Network | A | B | C | D | E | F |
---|---|---|---|---|---|---|---|
Factions | 0 + 62 | 21 + 41 | 15 + 47 | 35 + 27 | 39 + 23 | 38 + 24 | 22 + 40 |
Eq. (15) | 122.2 | 115.9 | 114.0 | 94.9 | 93.9 | 92.6 | 92.5 |
Division C is an analogous division on the other side of the main spit boundary of dolphin SN100 and a group of 12 closely interconnected dolphins on the right lower part of Fig. 11. These 12 nodes are exactly the nodes identified by [42] as a sub-community in the network. There are also three additional more divisions D, E, and F indicated in Fig. 11.
We have computed also the results with Node SN100 removed from the original network. The results are almost similar. The divisions A and B are the same as in Fig. 11. Division C has one Node Zap moved side. The fourth division of 16 nodes is in the lower left corner of Fig. 11 with dolphin SN9. The fifth and sixth divisions are exactly divisions D and E in Fig. 11.
Betweenness of nodes in the dolphin social network have been studied in [46]. Values of Eq. (12) have been calculated for two different parameter values describing low and high cohesion of the network. The results are very different for these two cases. At early phases of influence spreading different nodes have the highest betweenness, when compared with later phases of the process, because later more nodes have already been affected. At early phases local characteristics and neighbouring nodes are controlling the spreading processes. Node degree is describing centrality in these situations.
In fact, the highest node degrees of the dolphin social network are for dolphins Grin, SN4 and Topless with 12, 11 and 11 node degree values correspondingly. In the low cohesion network, the nine nodes with the highest values of betweenness measure are {Grin, SN4, Topless, Scrabs, SN9, Kringel, Patchpac, Trigger, TR99}.
In the high cohesion network, the eleven nodes with the highest values of betweenness measure are {SN100, Beescratch, SN9, Trigger, SN9, Trigger, SN4, Jet, Scrabs, Stripes, Kringel}. These results are in agreement with the results in [42] identified using the betweenness-based algorithm of [43].
Low cohesion exists at an early phase of influence spreading or when nodes’ activities are low, i.e. low node weighting factors. A result of this is that corresponding pairs of time and weighting factor values can be found such that they provide comparable results. In [46] this has been demonstrated in cases of low values of time with high values of weighting factors, and high values of time with low values of weighting factors. Almost identical results are obtained for \(T = 1.0,w_{N} = 1.0\) and \(T = 4.5,w_{N} = 0.5\).
According to the research article [42] the dolphin community has existed quite a long time. On the other hand, the positive assortative mixing by degree was not observed in the study, which is often observed in human social networks. However, a clear statistically significant assortative mixing by sex among the dolphin population has been observed, although the mixing is not as strong as some types of mixing in human societies [42, 45].
We conclude from the results of [42], because of the lack of positive assortative mixing by degree, that relatively low value of \(T = 1.0\) is appropriate. The value of \(w_{N} = 0.5\) for node weighting factors are used in Fig. 11. We have made experiments with higher values of weighting factors and higher values of time. In latter development phases of influence spreading local maxima of community detection in Eq. (15) are levelled and fewer sub-communities are discovered. At time \(T = 1.0\) with \(w_{N} = 1.0\) only one division is detected which is exactly the same Division A in Fig. 11. Using the method for low and high cohesion circumstances can be regarded as a method to examine a network with diverse resolutions.
We summarize the results of analysing the dolphin social network. The split of the dolphin population into two factions after a temporal disappearance of dolphin SN100 is predicted by the model with the exception of one dolphin SN89. In the literature, sub-communities have been identified using the betweenness-based algorithm in [43]. The proposed model of this study does not predict the same sub-communities but they can be identified by the help of investigating boundaries of different divisions predicted by the model.
There is a question whether the same model parameters are appropriate for dolphin and human social networks. We conclude from the research published in [42] that low values of time \(T\) or node weighting factors \(w_{N}\) might be more appropriate for dolphin social networks. The same reasoning applies to maximum path length \(L\) used in the model. In the proposed model, the consistent procedure is to use the same parameters for the community detection algorithm and for closeness and betweenness centrality measures.
Community structures of a Facebook social network
In this section we show results for a social network of Facebook. This dataset consists of ‘circles’ (or lists of friends). The network data have 4039 nodes and 88,234 links between nodes. The data also have nested and overlapping communities. Here, our main focus is on presenting the features of our model and methods for larger social networks. Because of the detailed modelling, where all the nodes are considered, when influence between nodes are calculated, complex phenomena appear which may not be present in very small social networks. We also provide strategies on how to optimize the community detection calculations to minimize computer running times. The Facebook social network is the same as used in [11].
The analysis is conducted with the entire social network data, with loops allowed (except self-loops), maximum path lengths \(L_{ \text{max} } = 6\), node weighting factors \(w_{N} = 0.1\), and link weighting factors \(w_{L} = 1.0\). The community detection measure of Eq. (15) is computed along the paths determined by the 88,234 links between nodes. As a result of the influence spreading process, all the nodes inside the maximum path length \(L_{ \text{max} }\) in a connected graph are influenced by a node and the corresponding elements of the \(N \times N\) matrix \(C_{s,t}\) have positive values. The full analysis of the network considers all these elements.
All the information for detecting communities and their relations in a network consisting of \(N\) nodes is included in one \(N \times N\) matrix, which has influence measures of Eq. (6) from \(N\) nodes to all the other nodes in the network. Because diagonal elements of the matrix have no effect on the community structure, we set the diagonal elements to zero. We show selected results of detailed structures of the network while the calculations have been conducted using the whole network data. Therefore, the method is global, not local, in this respect. This means that all the interactions have influenced the results, centrality and betweenness measures and community structures.
We have detected 551 sub-structures in the network. Most of these are nested structures inside communities. In large networks, many levels of nested sub-communities can exist. The algorithm provides list of nodes included in the 551 communities and the corresponding values of the community detection measure of Eq. (15). We order the communities by the values of the measure and use the ranking of a community as a unique identifier (in this section only smaller factions of divisions are studied). The analysis should start from important communities, their internal structures and relationships with other communities.
Nested sub-communities are indicated in Fig. 13 as \(C_{5} \in C_{6} \in C_{1}\) and \(C_{4} \in C_{3} \in C_{2} \in C_{1}\). Two different divisions of the 59 nodes are shown with dotted lines in Fig. 13 as \(C_{1} = C_{4} + C_{6}\) and \(C_{1} = C_{3} + C_{5}\). As can be seen in Fig. 12, three genuine overlapping cases exist: \(C_{6} - C_{5} = S_{2}\), \(C_{2} - C_{3} = S_{8}\), and \(C_{5} - C_{2} = S_{32}\). We denote these intersecting sets of nodes by \(S_{2}\), \(S_{8}\), and \(S_{32}\) because they have not been detected as sub-communities [local maxima of Eq. (15)] and have no community identifier in Fig. 12. However, together the three sets form sub-community \(C_{6}\), denoted by \(C_{6} = S_{2} + S_{8} + S_{32}\). Again, these sets of nodes may appear as sub-communities with lower parameter values. In fact, intersections of detected communities are candidates for new sub-communities.
In many cases a community has nested structures composed of sub-communities lower in hierarchy like the two examples in Fig. 13. This is a consequence of the fact that influence spreading is considered globally or at least inside the path length \(L_{ \text{max} }\), if it is not set to infinity. Also, when a nested sub-community is detected inside a community, the other faction \(S\) may not be detected as a sub-community. An example is shown in Fig. 12 where \(C_{3} = S_{2} + C_{4}\). This does not exclude the possibility that \(S\) is a constituent of a sub-community on higher levels. An example in Fig. 14 is the sub-community \(C_{8}\) of 264 nodes composed of sub-community \(C_{9}\) and a set of 32 nodes \(S_{32}\) \(\left( {C_{8} = S_{32} + C_{9} } \right)\). This is possible because community \(C_{1}\) and its sub-structures are also nested sub-structures of community \(C_{8}\) as can been seen in Fig. 14.
Full investigation of a large social network is a major task. In practice, the analysis is started from the most important communities detected from the network. This has been the idea in Figs. 13, 14. Alternatively the analysis is focused on communities, or nodes, of special interest. Different search criteria for the analysed results can be used, for example, node numbers, community identifiers and values of community measures. Nested structures are discovered by comparing the compositions (nodes) of the detected communities. Deep nested hierarchies can exist when some sub-communities extend their influence widely, like \(C_{1}\) in Figs. 12 and 14.
Conclusion
We consider a model with one ego initiating the influence spreading process in a social network. This allows us to study different phenomena in structured networks. In practical calculations, the proposed model can also be used for simultaneous source nodes of influence spreading. Dynamic measures for spreading in a social network are used as measures for centrality and betweenness. These measures are functions of network activity and time. Time can be interpreted as the development phase of social relations in a social network. A steady state is reached at high values of time. Therefore, measures describing centrality, betweenness or other characteristics of a network, can be calculated for the steady state or for different development phases of a network.
The proposed model takes into account different paths of the network from a source node to target nodes. Secondly, the dependency of paths is modelled by considering common links at the beginning of the paths. Combining these aspects is the novelty of the model compared to other models in the literature. These features of the model enable many opportunities to study new phenomena in complex networks and to solve existing problems more accurately.
Highly connected peripheral groups have multiple possible paths at the beginning of the spreading process which emphasizes the importance of these nodes as influential spreaders. It is known that initial spreading dynamics is crucial for later development of dynamic processes in a network [47].
We consider networks with a constant structure of nodes and links. Influence is spreading in the network and one node can spread similar influence repeatedly. Results in several recent articles [48, 49, 50] indicate that peripheral nodes, which are not highly influential, have more spreading power than most of the existing models predict. Our study provides evidence supporting these statements. Allowing loops (a node is allowed in the same chain of links more than once) further enhance the importance of interconnected peripheral nodes with low connectivity to central core network structures.
Activity of nodes has a nonlinear effect on rankings of the most influential nodes in the network. For example, if nodes’ activity is lower, the prominence of peripheral connected nodes is higher. We can say that the activity of nodes is an important aspect and models should take activity as one of the main variables of dynamic social network analysis and influence measures. In the model, activity is described by node and link weighting factors.
A new community detection measure is proposed in this paper. The community detection algorithm can be used to analyse possible sub-communities or closely connected members of the network. The idea in analysing community structures is based on the concept of nodes’ role in the network as sources and targets of influence. Both of these aspects have a role in community formation. The algorithm computes local maxima of an influence measure which considers both in- and out-directions of influence. Typically, social networks with weak interactions between nodes or social networks that are at their early development phases have several local maxima with different compositions. These factions can intersect and overlap with each other.
In this paper, we propose a consistent modelling framework for computing powerful influence spreaders and mediators in a social network. The same theory can be used in analysing community structures. The method is discussed and illustrated with several examples and graphical presentations.
Notes
Authors’ contributions
The author read and approved the final manuscript.
Acknowledgements
The author would like to thank Mr. Matti Syrjänen for writing down the pseudo-code of the algorithm based on the computer programme code developed by Janne Levijoki and Matias Ijäs.
Competing interests
The author declare that he has no competing interests.
Availability of data and materials
Not applicable.
Funding
No funding.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Borgatti SP, Everett MG. A graph-theoretic perspective on centrality. Soc Netw. 2006;28:466–84.CrossRefGoogle Scholar
- 2.Borgatti SP. Identifying sets of key players in a social network. Comput Math Organiz Theor. 2006;12:21–34.CrossRefGoogle Scholar
- 3.Agneessens F, Borgatti SP, Everett MG. Geodesic based centrality: unifying the local and the global. Soc Netw. 2017;49:12–26.CrossRefGoogle Scholar
- 4.Newman MEJ. A measure of betweenness centrality based on random walks. Soc Netw. 2003;27(1):39–54.CrossRefGoogle Scholar
- 5.Malliaros DF, Rossi M-EG, Vazirgiannis M. Locating influential nodes in complex networks. Sci Rep. 2016;6:19307.CrossRefGoogle Scholar
- 6.Gruhl, D, Guha, R, Liben-Nowell, D, Tomkins, A. Information diffusion through blogspace. WWW’04; 2004. p. 491–501.Google Scholar
- 7.Kempe, D, Kleinberg, J, Tardos, É. Maximizing the spread of influence through a social network. SIGKDD’03 Washington, DC; 2003. p. 137–46.Google Scholar
- 8.Kempe, D, Kleinberg, J, Tardos, É. Influential nodes in a diffusion model for social networks. In: Proceedings of 32nd international colloquium on automata, languages and programming; 2005. p. 1127–38.Google Scholar
- 9.Moreno F, Min B, Bo L, Mari R, Makse HA. Collective influence algorithm to find influencers via optimal percolation in massively large social media. Sci Rep. 2016;6:30062.CrossRefGoogle Scholar
- 10.Pei S, Morone F, Makse HA. Theories for influencer identification in complex networks. In: Lehman S, Ahn Y-Y, editors. Sreading dynamics in social systems. Berlin: Springer; 2017.Google Scholar
- 11.Ijäs, M, Levijoki, J, Kuikka, V. Scalable algorithm for computing influence spreading probabilities in social networks. In: 5th European conference on social media, limerick institute of technology (ECMS 2018), Ireland. 2018.Google Scholar
- 12.Watts DJ. Six degrees: the science of a connected age. London: W. W. Norton & Company Ltd.; 2004.Google Scholar
- 13.Fortunato S, Hric D. Community detection in networks: a user guide. Phys Rep. 2016;659(11):1–44.MathSciNetCrossRefGoogle Scholar
- 14.Newman MEJ, Park J. Why social networks are different from other types of networks. Phys Rev E Stat Nonlin Soft Matter. 2003;68:036122.CrossRefGoogle Scholar
- 15.Miller JC, Kiss IZ. Epidemic spread in networks: existing methods and current challenges. Math Model Nat Phenom. 2014;9(2):4–42.MathSciNetCrossRefGoogle Scholar
- 16.Newman MEJ. The structure and function of complex networks. SIAM Rev. 2003;45:167–256.MathSciNetCrossRefGoogle Scholar
- 17.Karrer B, Newman MEJ. Stochastic blockmodels and community structure in networks. Phys Rev E. 2011;83(1):016107.MathSciNetCrossRefGoogle Scholar
- 18.Lai D, Lu H, Nardini C. Enhanced modularity-based community detection by random walk network preprocessing. Phys Rev E. 2010;81(6):066118.CrossRefGoogle Scholar
- 19.Newman MEJ. Detecting community structure in networks. Eur Phys J B. 2004;38(2):321–30.MathSciNetCrossRefGoogle Scholar
- 20.Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D. Defining and identifying communities in networks. Proc Natl Acad Sci USA. 2004;101(9):2658–63.CrossRefGoogle Scholar
- 21.Shen H-W, Cheng X-Q, Guo J-F. Quantifying and identifying the overlapping community structure in networks. J Stat Mech. 2009;2009:P07042.CrossRefGoogle Scholar
- 22.Thakur GS, Tiwari R, Thai MT, Chen S-S, Dress AWM. Detection of local community structures in complex dynamic networks with random walks. IET Syst Biol. 2009;3(4):266–78.CrossRefGoogle Scholar
- 23.Xiang J, Wang Z-Z, Li H-J, Zhang Y, Li F, Dong L-P, Li J-M, Guo L-J. Community detection based on significance optimization in complex networks. J Stat Mech. 2017;2017:053213.MathSciNetCrossRefGoogle Scholar
- 24.Coscia M, Giannotti F, Pedreschi D. A classification for community discovery methods in complex networks. Stat Anal Data Mining. 2011;4(5):512–46.MathSciNetCrossRefGoogle Scholar
- 25.Lancichinetti A, Fortunato S. Community detection algorithms: a comparative analysis. Phys Rev E. 2009;80:056117.CrossRefGoogle Scholar
- 26.Shuo L, Chai B. Discussion of the community detection algorithm based on statistical inference. Persp Sci. 2016;7:122–5.CrossRefGoogle Scholar
- 27.Newman MEJ. Networks, an introduction. Oxford: Oxford University Press; 2010.CrossRefGoogle Scholar
- 28.Kernighan BW, Lin S. An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J. 1970;49(2):291–307.CrossRefGoogle Scholar
- 29.Fiedler M. Algebraic connectivity of graphs. Czechoslov Math J. 1973;23(98):298–305.MathSciNetzbMATHGoogle Scholar
- 30.Pothen A, Simon H, Liou K-P. Partitioning sparse matrices with eigenvectors of graphs. SIAM J Matrix Anal Appl. 1990;11:430–52.MathSciNetCrossRefGoogle Scholar
- 31.Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech. 2008;10:P10008.CrossRefGoogle Scholar
- 32.Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. PNAS. 2008;105(4):1118–23.CrossRefGoogle Scholar
- 33.DasGupta B, Desai D. On the complexity of Newman’s community finding approach for biological and social networks. J Comput Syst Sci. 2013;79:50–67.MathSciNetCrossRefGoogle Scholar
- 34.Agarwal G, Kempe D. Modularity-maximizing graph communities via mathematical programming. Eur Phys J B. 2008;66(3):409–18.MathSciNetCrossRefGoogle Scholar
- 35.Berman P, DasGupta B, Kaligounder L, Karpinski M. On the computational complexity of measuring global stability of banking networks. Algorithmica. 2014;70(4):595–647.MathSciNetCrossRefGoogle Scholar
- 36.Van de Bunt GG. Friends by choice. An actor-oriented statistical network model for friendship networks through time. Amsterdam: Thesis Publishers; 1999.Google Scholar
- 37.https://en.wikipedia.org/wiki/Risk_(game). Accessed 8 Aug 2017.
- 38.Lusseau D. The emergent properties of a dolphin social network. Proc R Soc Lond B. 2003;270:186–8.CrossRefGoogle Scholar
- 39.Freeman LC. Centrality in social networks: conceptual clarification. Soc Netw. 1979;1:215–39.CrossRefGoogle Scholar
- 40.Katz L. A new status index derived from sociometric analysis. Psychometrica. 1953;18(1):39–42.MathSciNetCrossRefGoogle Scholar
- 41.Cheng J, Leng M, Li L, Zhou H, Chen X. Active semi-supervised community detection based on must-link and cannot-link constraints. PLoS ONE. 2014;9(10):e110088.CrossRefGoogle Scholar
- 42.Lusseau D, Newman MEJ. Identifying the role that individual animals play in their social network. Proc R Soc Lond B (Suppl.). 2004;271:477–81.CrossRefGoogle Scholar
- 43.Girvan M, Newman MEJ. Community structure in social and biological networks. Proc Natl Acad Sci USA. 2002;99(12):7821–6.MathSciNetCrossRefGoogle Scholar
- 44.Newman MEJ, Girvan M. Finding and evaluating community structure in networks. Phys Rev E. 2004;69:026113.CrossRefGoogle Scholar
- 45.Newman MEJ. Mixing patterns in networks. Phys Rev E. 2003;67(2):026126.MathSciNetCrossRefGoogle Scholar
- 46.Kuikka V. Influence spreading model used to community detection in social networks. In: Cherifi C, Cherifi H, Karsai M, Musolesi M, editors. Complex networks & their applications VI. COMPLEX NETWORKS 2017. Studies in computational intelligence, vol. 689. Cham: Springer; 2018. p. 202–15.Google Scholar
- 47.Zou CC, Towsley D, Gong W. Modeling and simulation study of the propagation and defence of Internet email worm. IEEE Trans Dependable Secure Comput. 2007;4(2):105–18.CrossRefGoogle Scholar
- 48.Šikić M, Lančić A, Antulov-Fantulin N, Štefančić H. Epidemic centrality—is there an underestimated epidemic impact of network peripheral nodes? Eur Phys J B. 2013;2013:86–440.MathSciNetzbMATHGoogle Scholar
- 49.Lawyer G. Understanding the influence of all nodes in a network. Scientific reports. 2015;5:8665.CrossRefGoogle Scholar
- 50.Csermely P, London A, Wu I-Y, Brian B. Structure and dynamics of core/periphery networks. J Compl Netw. 2013;1:93–123.CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.