1 Introduction

Many real-world networks, like online social networks, transportation networks, email networks, collaboration networks, and many others are complex weighted networks [1]. These networks can be modeled as graphs, \(G= (V, E, W)\) where V represents a set of nodes, E denotes edges between nodes, and W represents the edge weight. The weight of a connection between two nodes usually depends on the exchange of services, intensity, or duration [2]. If two nodes are frequently interacting or if they have high interactions, then information or diseases are more likely to be transferred between them [3]. Complex networks have a large number of nodes, and interaction between nodes is usually complicated. The evolution of complex networks has led to the establishment of many useful applications like influence maximization, viral-marketing, and information propagation [4]. Influence maximization [5] is a technique to select some constant number of nodes as seed nodes which are capable of spreading information by “Word-of-Mouth” analogy, after knowing the information source. Therefore, influence maximization finds great value in the business. The influence maximization has many applications, for example, controlling the proliferation of messages and rumors, positioning influential researchers, and discovering social leaders. The process of influence maximization consists of two activities, first, identifying the seed spreaders and the second the information diffusion phase. In the study of disease transmission, numerical models are helpful in understanding the spread and control of epidemics. The circulation of the information in the complex network is very similar to the epidemic spreading, and many popular methods for modeling information diffusion are based on the epidemics spreading [6, 7]. In epidemiology, mathematical models play a role as a tool in analyzing the spread and control of infectious diseases [8]. Many researchers have successfully applied epidemic spreading models to information propagation in complex networks to estimate the final spread of the information originating from the source nodes.

Most of the influence maximization models on weighted-networks are merely extensions of the algorithms counterparts on unweighted networks by introducing edge weight into the models. Numerous other models have considered standard graph-theoretical features based approach for identifying the important spreaders. In this paper, we propose a Hooke’s law of elasticity based approach named HookeRank method to identify spreaders in a weighted network. Our algorithm considers the influence of nodes in a setting where edges are modelled as springs and edge-weights are modelled as elasticity coefficients. We model edge weights as spring constants. The edges present in the network are modeled as springs, which are connected in series and parallel. They elongate by a distance under the effect of an assumed constant force following Hooke’s law of elasticity, and this is the equivalent propagation distance between nodes in the network. The contributions of our work are as follows:

  1. 1.

    We propose a novel method based on Hooke’s Law of Elasticity in complex weighted networks to find the influential spreaders.

  2. 2.

    We model the equivalent weight between indirectly connected nodes in a weighted network.

  3. 3.

    The proposed algorithm is an improved method of selection of influential nodes on real-world data-sets.

The rest of the paper is organized as follows: Sect. 2 consists of the related work in this field. Section 3 presents the data-sets and the models for information diffusion used in this paper. In Sect. 4, we describe the methodology of our novel method and simulation on a toy network. Section 5 discusses the simulation of our algorithms on various real-world networks. Finally, Sect. 6 concludes our paper.

2 Traditional Centralities

The initial models in the field of influence maximization have majorly been innovations in the field of unweighted networks where all edges are equally important. In real-world networks, these edges are associated with weights that need to be considered while analyzing the strength of these connections, during a cycle of information diffusion. When we consider these aspects of topology, we can gather insights into what is most beneficial for the maximization of information diffusion. The early advances in weighted networks were through centralities like DegreeRank used for unweighted networks, by additional weighing of these edges to achieve a weighted DegreeRank [9]. In a similar pattern, multiple centralities were eventually derived from unweighted networks, evolving into a method for weighted graphs through mathematical adjustments leading to weighted algorithms. Betweenness centrality considers the shortest path of a node in an unweighted graph, and it was extended for weighted version giving the weighted-betweenness centrality [10, 11]. Based on the notion of the voting scheme, researchers have proposed influence maximization algorithm in unweighted as well as weighted networks where the nodes getting the highest votes in each round gets selected as spreader nodes [12,13,14]. The h-index is a measure of the impact of researchers based on the number of citations received, and by augmenting edge weight, Yu et al. proposed a weighted h-index centrality [15]. Weighted-eigenvector centrality applicable in a weighted network is based on the fact that a node is important if its neighbors are also famous and finds the centrality for a node as a function of the centrality of its neighbors [16]. Eades [17] suggested to model the edges of the network as springs to draw graphs by minimizing potential energy. This method was later refined by Fruchterman et al. [18], where they model nodes as electrical charges and edges as connecting springs. The electrical charges make these nodes repel each other. One of the most popular algorithms for drawing graphs is Kamada and Kawai’s method, which models the edges of the graph as springs acting following Hooke’s Law [19, 20]. The method optimizes the length of the spring between any two nodes by minimizing a global cost function. We argue the applicability of the spring-based model to measure the centrality of the nodes and to find influential spreaders.

3 Datasets and Performance Metrics

3.1 Information Diffusion Model

SIR Model: In this paper, we utilize the susceptible-infected-recovered (SIR) model as the data diffusion model [21]. This model divides nodes into three categories Susceptible (S), infected (I), and recovered (R). Susceptible nodes are supposed to receive data from their infected neighbor nodes. The information starts from a subset of the network nodes with the spreading parameter (\(\beta \)), and recovery rate (\(\gamma \)). In the SIR model, initially, all nodes, except seed nodes, are in a susceptible state. After each progression, the infected nodes affect their susceptible neighbors with a likelihood of \(\beta \). Infected nodes at the next timestamp enter the recovered stage with a likelihood of \(\gamma \). When arriving at the recovered stage, they are no longer prone and can’t be infected again.

3.2 Performance Metrics

The final infected scale (\(f(t_c)\)):- It is a measure of the final spread of the information originating from the chosen seed nodes at the end of SIR simulations. The final infected scale is the final number of recovered users that passed through the chronological advancements from susceptible, infected, and finally, to recover during the information diffusion process. There are two ways to measure this criterion, first \(f(t_c)\) is plotted against time t, which shows us the propagation of the information on the network as time proceeds. Secondly, \(f{(t_c})\) is plotted against the different fraction of spreaders, which shows us the propagation of the information on the network as the number of spreaders taken by the algorithm initially is changed.

3.3 Datasets Used

Table 1. Real world data-sets for simulation

4 Methodologies

In this section, we present the mapping of the edges present in the network as springs, which are connected in series and parallel, and describe the proposed HookRank method. The discrete part of individual connections of springs is discussed in detail. In a weighted network, weights generally mean that the higher the weight, the stronger is the connection. The same is true for our network as well since we know from Hooke’s law that more is the spring constant, less is the displacement from the spring. Now this new unit of distance between any two Nodes. This distance is the actual distance between these nodes when the information diffusion is to be considered. By normalizing these distances, we can model not only the approximate form but also the equivalent of all the different paths that exist between any two pairs of nodes. When we model this, individual graphs for each node are created, and these nodes can now be evaluated based on the amount of information they can propagate. We will consider that a constant force \(F_0 = 1\) continuously acts on the node that is chosen as a seed node. Now the seed node is connected to every other node with a spring constant of \(k = w_{ij}\) where k is the spring constant of a spring that connects node i and node j with a weight of \(w_{ij}\). Now using a breadth-first traversal (BFS), calculate the new spring between the edges, if there exist, multiple edges between the springs, on different levels, they must be added since they are definitely parallel. When traversing from the node of one level to another, use the series combination to generate equivalent springs and to calculate most probable distances and finally propagating information through each of these nodes to find the maximum amount of spreading that takes place (Fig. 1).

Fig. 1.
figure 1

Calculating the heuristic distance between two nodes based on weights of the edges, being modeled using springs and evaluating to a single spring, following Hooke’s Law of Elasticity

Parallel: When Springs are placed in parallel, they end up as a joint spring with the total elasticity of a new spring of a spring constant that can be modeled using the fact that the spring is definitely much stiffer.

Series: It is possible to add the contributions of the springs in series. The Springs in series make a more flexible spring that tends to elongate more than the springs that are previously involved in the complete connection.

4.1 Distance Calculation

When two springs of different spring constants, \(k_1\) and \(k_2\) respectively are placed in series with each other, we get (Fig. 2):

$$\begin{aligned} 1/k_{eq} = 1/k_1 + 1/k_2 \end{aligned}$$
(1)
Fig. 2.
figure 2

Demonstration of serial springs following Hooke’s law. Series edges occur when more than one node occurs in the path.

When two springs of different spring constants, \(k_1\) and \(k_2\) respectively are placed in parallel with each other, we get (Fig. 3):

$$\begin{aligned} k_{eq} = k_1 + k_2 \end{aligned}$$
(2)
Fig. 3.
figure 3

Demonstration of parallel springs following Hooke’s law. Parallel edges occur when more than one path of reaching the same node exists.

This means in an actual network is that springs in series are stiffer if the strength of ties in the individual connections is strong. It also implies that more connections from one node to another, add up a single connection, as seen in case of parallel connections. Now the equivalent distance between any of these nodes, under a constant force is given as:

$$\begin{aligned} x = f/k \end{aligned}$$
(3)

Assuming \(f = 1\), without loss of generality, we can easily see that the full measure of distance in this network is relative. A breadth-first search similar method is used for the nodes to find the equivalent value of k between all indirect neighbors.

4.2 Proposed Algorithm

Based on the notion of modeling edges as springs, we compute the centrality of the nodes in the network. Edge weights in the network are the spring-constants of these modeled springs. The proposed HookeRank method uses the following steps:

  1. 1.

    For each node, we perform a breadth-first search to all other nodes and calculate the distance of each node in series from the nearest neighbor.

  2. 2.

    In the BFS traversal, when a new node is encountered, we add its distance from its parent using a series combination as in Eq. 2 and add this neighbor to the queue.

  3. 3.

    All the nodes that occur again in the BFS traversal are assumed to be in a parallel connections and add up to the spring constant according to Eq. 1.

  4. 4.

    When for a given node, the queue is completely processed, we get its equivalent tree.

  5. 5.

    Calculate the HookeRank value for the node by finding the weighted average degree of the given node in the equivalent tree.

  6. 6.

    This procedure gives us the HookeRank value for each node, and the result is stored in a dictionary containing the node and its HookeRank value.

  7. 7.

    As, the objective of influence maximization is to select top c nodes, where c is a constant. Here, the top c nodes are the nodes having the maximum HookeRank score values in the ranking, and such nodes can be chosen as the influential spreaders.

4.3 Time Complexity of the Proposed Algorithm

The time complexity of the HookeRank method consists of initialization of spring constants and selection of the node with the highest number of the closest neighbor score and finding the level order traversal with respect to all the nodes (Step 1 to Step 4). Overall this makes the complete algorithm bounded by \(O(V(E+V))\). This time complexity, however, reduces because we know that if an equivalent spring from A to B has the constant k, then spring from B to A has the same spring constant (Step 6).

5 Results and Analysis

We perform the experiment of the proposed HookeRank method along with the contemporary centrality measures like weighted- degree, weighted betweenness centrality, weighted eigenvector centrality, and weighted voteRank. The investigation has been performed on a toy network and three real-world networks of different nature, application, and size that are listed in Table 1. We use the SIR model to compute the final infected scale, \(f(t_{c})\), as a function of spreaders fraction and final infected scale in terms of increasing timestamps. The results were averaged over SIR 100 simulations. For simplicity and to maintain consistency in the analysis for all data-sets, we chose infection rate (\( \beta \) ) as 0.01, meaning that when a node is infected, then it can infect 1\(\%\) of its neighbors randomly.

5.1 Simulation of the Proposed Algorithm On a Toy Network

Here, we simulate the working of the proposed HookeRank method using a toy network, as depicted in Fig. 6. The network is a weighted graph with edge weights representing the stiffness constant of the spring (Fig. 4).

Fig. 4.
figure 4

A sample Weighted Network where the edges are modelled as springs and the spring constant is analogous to the weight of the edge

Fig. 5.
figure 5

The run of a breadth-first search with respect to A as the starting node

Fig. 6.
figure 6

The final network with respect to node A. Notice that certain indirect neighbors became direct neighbors under the action of both series and parallel connections.

Let us consider the steps of the algorithm for a sample node A to understand the working of this algorithm. We take a FIFO queue and put A inside the queue. Now, the distance to A is 0. Now let’s compute the spring constant for the first neighbors B, C, D, E using the direct connection of the spring. These are then popped out, and their neighbors are pushed into the queue. The equivalent spring constant is found through a series connection through the parent. In the case of multiple parents at the same level, the constant is found using a parallel combination, as in the case of node H. For the node A, the simulation and calculation of the equivalent distances of all the other nodes by using a breadth-first search are performed. Its immediate neighbors are processed first, and so on, the different layers are highlighted in a level order fashion, as shown in Fig. 5. We can now compute the value of all the neighbors of A, as performed in the above section. The computation will result in a graph similar to Fig. 6. Notice the connections and the spring constants. Thus, the average spring constant for A becomes, 12.57 / 8 = 1.52, and thus, an elongation averaging 1.57 are taken, giving us a HookeRank value of 1.57 for the node A.

Table 2. Results of HookeRank Score of various nodes in the toy Network

A similar calculation can be performed for each of the nodes, and their equivalent HookeRank value is calculated, as given in Table 2. Based on the value of the HookRank score, node H is elected as the top spreader in the toy network.

5.2 Simulation of the Proposed Algorithm on Real-Life Networks

Figure 7, Fig. 8, and Fig. 9 depicts the final infection scale (\(f(t_c)\)) with respect to the percentage of spreaders for three real-life data-sets with infection rate (\(\beta \)) as 0.01. We consider the percentage of influential spreaders as the seed nodes in the range of \(2\%\), \(4\%\), \(6\%\), \(8\%\), and \(10\%\) to plot the final infection scale. In Fig. 7, note that the number of nodes affected by the infection is maximum for HookeRank on the US-Airports Network for most percentages of spreaders. In Fig. 8, HookeRank greatly exceeds the performance of other algorithms towards increasing the count of the spreader fraction. In the weighted PowerGrid data, shown in Fig. 9, HookeRank performs better than most other algorithms from an early stage. In the weighted PowerGrid data, shown in Fig. 10, the increase in the number of spreaders results in WVoteRank becoming marginally close to HookeRank, but our algorithm still performs better than all other algorithms in the simulation.

Fig. 7.
figure 7

The infection scale with respect to the percentage of spreaders on US-Airports network with \( \beta \) = 0.01.

Fig. 8.
figure 8

The infection scale with respect to the percentage of spreaders on Facebook-like weighted network with \( \beta \) = 0.01.

Fig. 9.
figure 9

The infection scale with respect to the percentage of spreaders on US PowerGrid network with \( \beta = 0.01\).

Figure 10 presents the final infection scale (\(f(t_c)\)) with respect to the increasing timestamps with infection rate (\(\beta \)) as 0.01 and top \(7\%\) influential as seed nodes on US PowerGrid network. Figure 11 shows the final infection scale (\(f(t_c)\)) with respect to the increasing timestamps with infection rate (\(\beta \)) as 0.01 and top \(5\%\) influential as seed nodes on US PowerGrid network. Figure 12 displays the final infection scale (\(f(t_c)\)) with respect to the increasing timestamps with infection rate (\(\beta \)) as 0.01 and top \(5\%\) influential as seed nodes on Facebook-like weighted network.

Fig. 10.
figure 10

The final infection scale with respect to the time on US PowerGrid Network with \( \beta \) = 0.01 and \( \rho \) = 7\(\%\).

Fig. 11.
figure 11

The final infection scale with respect to the time on US-Airports Network with \( \beta \) = 0.01 and \( \rho \) = 5\(\%\).

Fig. 12.
figure 12

The final infection scale with respect to the time on Facebook-like weighted Network with \( \beta \) = 0.01 and \( \rho \) = 5\(\%\).

From above results on three real-life networks, it is evident that HookeRank performs better than state-of-the-art methods like weighted-degree centrality, weighted-betweenness centrality, weighted-eigenvector centrality, and weighted-voteRank, and also consistently outperforms recent methods like WVoteRank in terms of final infected scale with respect to time t and spreader fraction p on real-world networks as depicted in Table 1.

6 Conclusion

In this paper, we proposed the HookeRank method for finding influential nodes in weighted networks by modeling edges of the network as springs and edge weights as spring constants. Initially, we found a measure of the distance between indirect neighbors through the series and parallel combination of edges, by modeling them as springs. The HookeRank method and the HookeRank distance can be used to gain a better understanding of the topology in a complex weighted network. By finding the Hookerank value of the nodes, our method locates the top spreaders in the given real-world network to reach a large number of people in the network to maximize the spread of the information. The proposed algorithm incorporates both the local and global properties of a node in the measurement of its spreading capability. We performed the simulation of the proposed method along with contemporary methods on three real-life data-sets taking the basis of evaluation as the final infected scale. The proposed influence maximization algorithm performs considerably well and is effective in real-life scenarios.