Introduction

With the rising popularity of massive-scale online social networks such as Facebook, Instagram, Twitter, etc., people are more connected and can share information with each other (Zhuang et al. 2013). These platforms play a vital role in the dissemination of positive information such as new ideas, innovations, and hot topics. However, they may also become channels for the spreading of malicious rumors, misinformation, or even dangerous virus and malware. The rumor spreading can severely threaten public safety and financial stability. For instance, some people may post on social networks a rumor about an upcoming big earthquake. It will cause chaos among society and hence may hinder the normal public order. In this situation, it is necessary to find individuals, if their account deactivated or removed from the network, would block further rumor spreading. The problem is known as graph protection problem where the goal is to protect a number of nodes to restrain the epidemic propagation by maximizing the ratio of surviving nodes in a network (Wijayanto and Murata 2017; 2018b). In this problem, the protection budget constraints the number of nodes we are allowed to protect.

Real-world social networks and collaboration networks have highly dynamic structures, and they evolve rapidly over time (Zhan et al. 2017; Wang et al. 2016). The inherently dynamic nature of the network leads to dynamic network representations. Dynamic networks are defined as networks which evolve over time by the addition and removal of nodes and edges (Bakker et al. 2018; Moore et al. 2006; Zhuang et al. 2013). Dynamic networks have temporal relationship features which specify the number of connection among nodes that are active at a certain time.

Restraining the epidemic spreading in dynamic networks is obviously more challenging than in static networks because of the temporal changing of the network structure. However, most of the existing work failed to address the incoming rumor or virus attacks during the temporal transition in dynamic networks. The existing strategies either pre-emptively protect critical nodes prior to epidemic attacks, behaving as prevention efforts (Prakash et al. 2010; Chen et al. 2016; Wijayanto and Murata 2017), or post-emptively allocate the protection while the epidemics have already propagated over the network, simulating as delayed reactions (Zhan et al. 2017; Zhang et al. 2017; Zhang and Prakash 2015; Song et al. 2015). In this paper, we introduce a multiple-turns graph protection strategy by dividing the protection budget into several turns and protecting nodes based on the currently observed temporal structure of networks.

On the other hand, most of the current works of graph protection strategy mostly fall into one of the following drawbacks: (1) protecting only particular areas of the networks, as demonstrated by centrality-based methods (Prakash et al. 2010; Buono and Braunstein 2015; Zhao et al. 2014) (2) scalability issue, as demonstrated by dominator tree-based methods (Zhang and Prakash 2014; Zhan et al. 2017; Zhang and Prakash 2015) (3) lack of convergence guarantee in large size networks, as shown by eigendecomposition-based methods (Tong et al. 2010; Chen et al. 2016; Wijayanto and Murata 2017; Prakash et al. 2010). We propose the construction of minimum vertex cover to determine the protected nodes in an efficient and scalable method. The minimum vertex cover (MVC) is the set of nodes which cover all edges of networks in a minimum possible size of nodes. As we explain later, MVC serves as the protection threshold of the network (see “Proposed methods” section for our detailed explanation).

In recent years, reinforcement learning (RL) approaches have obtained many state-of-the-art results in solving various complex problems (Mnih et al. 2015; Riedmiller 2005). RL allows autonomous agents to learn to improve their performance with experience. In this work, we utilize RL approach to construct MVC from the currently observed network snapshot. Specifically, we propose n-step fitted Q-Learning to obtain the MVC solution of input network by leveraging the neural network as a function approximator. Neural network architecture allows us to efficiently accelerate the training and execution of our proposed methods in mini-batch processing and multiple graphics processing units to deal with large size networks. In order to handle the different size and structure of each temporal snapshot of dynamic networks, each node is represented in a fixed-length feature vector using a graph embedding technique.

Extensive evaluations in both synthetic and real-world network datasets show that our proposal effectively restrains epidemic spreading. In Email network dataset, by protecting about 15% of nodes, our methods can achieve up to 84% of surviving nodes and outperform other baseline methods. Comprehensive evaluations under two most popular epidemic models, i.e., SIS and SIR, confirms the effectiveness and scalability of our methods.

The novelty of our methods arises primarily from the application of more stochasticity and learning ability to graph protection problem, specifically on dynamic networks. In large-scale social networks, the changing of relationship structure and rumor spreading patterns may come and arise on a regular basis. Therefore, there is an opportunity to learn the current condition into a model using reinforcement learning. By learning the given temporal structure of observed networks and existing epidemics, this will provide a new incentive to predict future protection from previously learned actions in the same dynamic networks.

This paper extends our preliminary idea in (Wijayanto and Murata 2018a). In addition to the contents in (Wijayanto and Murata 2018a), this paper includes the following: detailed explanation of the proposed methods; evaluation on synthetic networks, as well as more real-world network datasets; review of the relevant related work; discussion of scalability and computational complexity; evaluation of parameter sensitivity; addition of stronger baselines methods such as Betweenness, GraphShield and NetShield+; and evaluation on SIR epidemic model.

The remainder of this paper is organized in the following manner. We formalized the problem and definition in “Problem formulation” section. The review of recent most related studies is presented in “Related work” section. Our proposed methods, namely ReProtect and ReProtect-p are described in “Proposed methods” section. The result of experimental simulations are provided in “Evaluation” section. Finally, concluding remarks of our work is provided in “Conclusion” section.

Problem formulation

In this section, we formalize the definitions and problems used throughout this paper. We summarize the symbols and notations in Table 1.

Table 1 Summary of Symbols and Notations

Definition 1.Protecting a node means removing all of its corresponding edges. The number of nodes we are allowed to protect is constrained by the protection budget (\(k \in \mathbb {Z}_{> 0}\)). At time t, a node in a network can belong to any of the following states: susceptible and infected. Attacking a node means initially infect the node in a network. Figure 1 shows the example of protection and attack in a static network.

Fig. 1
figure 1

Example of a static network. Green colored node indicates the node is protected. Dashed green colored edges indicate the edges are removed or inactivated because of protection. Initial red colored node on network indicates the node is attacked. Other red colored nodes indicate the nodes are infected

Definition 2. Graph Protection Problem

Let G=(V,E) be an undirected connected graph with set of nodes V and set of edges E. Let θ be the surviving ratio of nodes that remain uninfected at the end of epidemics.

Given an input graph G, SIS or SIR epidemic model, and a protection budget k, the goal is to find a set of nodes SV such that θ is maximized, subject to the size of S is equal to constraint budget k. The protection is performed by removing all edges connected to the set of nodes S in graph G to get a new graph G(S).

Definition 3. Dynamic Network

Let {1,⋯,T} be a finite set of discrete time steps. Let VD={1,⋯,n} be a set of nodes which appear within time {1,⋯,T}. Let Gt=(Vt,Et) be a graph representing the snapshot of the network at time t. VtVD is a subset of nodes VD observed at time t. (t,u,v) represents an edge from vertex uVt to vVt at time t.

A dynamic network GD=(VD,ED) is a series 〈G1,⋯,GT〉 of static networks where each Gt=(Vt,Et) is a snapshot of nodes and their edges at time t such that \(V^{D} = \bigcup _{t} V_{t}\). For the sake of consistency, the time during which the nodes are observed is assumed as finite. Following the definition by (Habiba et al. 2010) and (Bakker et al. 2018), the temporal length of G is assumed to be divided into discrete steps {1,⋯,T}. The non-trivial problem of appropriate time discretization is beyond the scope of our work.

Definition 4. SIS Epidemic Model

Susceptible-Infected-Susceptible (SIS) is an epidemic epidemic model which define that each node in graph G with N number of nodes would be in one of the following two states: susceptible and infected. Let (t) be the number of susceptible nodes, and let (t) be the number of infected nodes at time t. At each timestamp t, susceptible nodes can be infected by their infected neighbors with infection rate β. Also, each infected node can get recovered to susceptible state with recovery rate δ. In the homogeneous case of well-mixed populations, this model can be formalized as non-linear differential equations:

$$ \frac{ds}{dt} = -\beta i s, \frac{di}{dt} = \beta i s - \delta i, $$
(1)

being s(t)=(t)/N and i(t)=(t)/N the respective proportions of states at time t. A continuous-time epidemic process under constant infection rate β and recovery rate δ on any network can be described by Markov theory. Following the definition of SIS epidemic in network by Pastor-Satorras (2015), the individual-based mean-field (IBMF) and degree-based mean-field (DBMF) approach can be used to analytically simulate the SIS model.

Definition 5. SIR Epidemic Model

In Susceptible-Infected-Recovered (SIR) model, each node in graph G belong to any of the susceptible, infected, or recovered state. Each of recovered node is resistant of any infection. Let (t) be the number of recovered nodes. Following the definition by Kermack and McKendrick (1927), for the homogeneous case of well-mixed populations, this model is formalized as:

$$ \frac{ds}{dt} = -\beta i s, \frac{di}{dt} = \beta i s - \delta i, \frac{dr}{dt} = \delta i, $$
(2)

being s(t)=(t)/N, i(t)=(t)/N, and r(t)=(t)/N the respective proportions of states at time t. In addition to IBMF and DBMF approach, following the definition of SIR epidemic in network by Pastor-Satorras (2015), we can analytically describe the SIR model using generating function approach, where the probability that a link exists is related to the probability of transmission of the disease from an infected node to a connected susceptible one.

Definition 5. Multiple-turns Graph Protection Problem on Dynamic Networks

Let GD=(VD,ED) be an undirected dynamic graph as an input, with a series of a known sample 〈G1,⋯,GT〉 of snapshots where each Gt=(Vt,Et) represent a static network at time t. Let k be the protection budget, which k<|VD| and allocated into several turns according to the number of observed snapshots of GD. Protection budget for snapshot Gt at time t is denoted by kt such that \(k = \sum _{t=1}^{T} k_{t}\).

Let us denote S, a set of k protected nodes from graph GD and \(S = \sum _{t=1}^{T} S_{t}\) where St denote a subset of kt protected nodes of snapshot graph Gt at time t. Protection means removing corresponding edges of the set of nodes St in graph Gt to get a new graph \(G_{t}^{(S)}\). Under random attack strategies, l nodes are randomly attacked (i.e., initialized as infected nodes) from graph GD such that \(l = \sum _{t=1}^{T} l_{t}\) at each turn in time t. We define θ as the ratio of surviving nodes of graph GD.

Given an input graph GD, SIS or SIR epidemic model, and a protection budget k, the goal is to find S such that θ is maximized, subject to the size of S is equal to constraint budget k, i.e., calculating the following combinatorial optimization:

$$ \begin{aligned} S^{*} = \underset{S \in V}{\text{arg\ max}}\ \theta \\ \text{s.t.} |S| = k {,} k = \sum_{t=1}^{T} k_{t} \\ \end{aligned} $$
(3)

Related work

In this section, we review the relevant existing studies related to our work. We first review the fundamental work of epidemic modeling on dynamic networks, then we discuss some related work on graph protection strategy and its application in dynamic networks. Finally, some problems related to graph protection on dynamic networks are presented.

Fundamental work of epidemic modeling on dynamic networks

The properties of dynamic networks are essentially different from those in static networks. (Braha and Bar-Yam 2006; 2009) found that the overlap of the centrality in dynamic networks and that in the aggregated (static) network is quite low. They also demonstrated that the static topology is unable to capture the dynamic properties of social networks. Hill and Braha (2010) propose a reinforced random walk approach to explain dynamic centrality phenomena and qualitatively reproduce the characteristic features of real-world networks. Those studies (Braha and Bar-Yam 2006; 2009; Hill and Braha 2010) provide an important foundation of dynamic network properties.

Holme presents a systematic review of dynamic networks and discusses methods for topological and temporal structure analysis (Holme and Saramäki 2012; Holme 2015). More specifically, Pastor-Satorras et al. (2015) discuss a fundamental review of epidemic model on dynamic networks, which also recently emphasized by Enright and Kao (2018).

Graph protection strategy and its application in dynamic networks

The study of graph protection strategies has mostly been introduced by assuming the static topologies of network structure. Pastor-Satorras and Vespignani investigated the effect of random uniform and targeted high-degree immunization of individuals on homogeneous complex networks and scale-free networks (Pastor-Satorras and Vespignani 2002). Chen et al. proposed NetShield (Tong et al. 2010) and NetShield+ (Chen et al. 2016) which use the properties of matrix perturbation to find a set of nodes in static networks to be pre-emptively protected (Tong et al. 2010). Zhang and Prakash (2014; 2015) developed DAVA and DAVA-fast, two post-emptive polynomial-time heuristics methods which merge all infected nodes into a supernode by building a weighted dominator tree of input network. NIIP (Song et al. 2015) extracts a maximum directed acyclic graph from a static network then implements a Monte Carlo simulation to approximate the distribution of k over each time point t given the probability of a functional node getting infected. Wang et al. investigated a rumor blocking in static networks by considering dynamic Ising propagation model which consists of the individual tendency and global popularity of the rumor Wang et al. (2016; 2017). Under the constraint of user experience utility, they proposed DRIMUX method to protect a set of nodes in t time interval to limit the spreading of rumor.

In dynamic networks, Prakash et al. proposed greedy algorithms, called NLDS, as pre-emptive protection of the dynamic networks (Prakash et al. 2010). The methods are composed on different variants which select protected nodes based on the highest degree centrality, acquaintance (random neighbor) or the largest eigenvalue of the adjacency matrix. Liu & Gao investigated a different task of influence blocking in dynamic email networks (Liu and Gao 2011). They introduced an adaptive Autonomy-Oriented Computing which actively propagates the vaccination patches to counter a virus-embedded email spreading. VAILDN is introduced by Zhan et al. (2017) as a post-emptive scheme protection. By merging all infected nodes into one supernode and building a weighted dominator tree of modified input network, VAILDN determines the protected nodes based on each sub-tree benefit comparison.

Table 2 shows the comparison of our proposed method to the relevant existing work on graph protection strategy. To summarize, none of the existing works investigated the suppressing the epidemic spreading by multiple-turns graph protection strategies on dynamic networks.

Table 2 Comparison of the proposed method to related existing work

Problems related to graph protection on dynamic networks

There are some problems related to our work. Epidemic containment using link deactivation (Bishop and Shames 2011; Van Mieghem et al. 2011; Matamalas et al. 2018), aims to deactivate a set of links (instead of nodes) to contain epidemic spreading in the networks. Van Mieghem proposes a link removal approach to decrease the spectral radius of graph during epidemic spreading (Van Mieghem et al. 2011). Bishop discusses a mechanism for reducing the speed of disease propagation (Bishop and Shames 2011). Matamalas introduces an epidemic controlling approach based on the deactivation of most important links transmitting the disease (Matamalas et al. 2018). These studies are different from our focus as they are focusing on link selection instead of node selection. Additionally, in the real-world social networks, nodes represent users while links/edges represent friendship connections among users. For a network administrator, such as in Facebook or Twitter, it is more reasonable to temporarily deactivate a certain user in the case of rumor spreading than to deactivate part of the users’ friendship relations. While in human contact networks, it is more plausible to immunize an important person than to restrict a combination of several peer-to-peer interactions.

Network dismantling (Braunstein et al. 2016; Ren et al. 2018) is another problem related to our work. It is the problem of determining a minimum set of nodes in which removal breaks the network structure into subcritical connected components at minimum cost. Braunstein et al. (2016) provides insightful finding that the dismantling problem is an intrinsically collective problem and that optimal dismantling sets cannot be viewed as a collection of individually well-performing nodes. Ren et al. (2018) proposed a method based on the spectral properties of a node-weighted Laplacian operator to solve the problem.

Influence maximization problem on dynamic networks is also related to our work. While in the influence maximization we aim to maximize the influence spreading (information diffusion) (Tong et al. 2017; Murata and Koga 2018), the graph protection tries to restrain and contain any of those spreading process. Tong et al. (2017) introduced a greedy adaptive seeding strategy as an efficient heuristic for maximizing influence in dynamic social networks. Murata and Koga (2018) proposed three new methods for solving the problem which are the extensions of the methods for static networks.

Proposed methods

In this section, we propose new methods for multiple-turns graph protection problem in dynamic networks, namely ReProtect and ReProtect-p. To restrain the spreading of epidemic in dynamic networks, we divide the protection budget wisely into several turns. The protected nodes are selected in each turn according to the currently observed temporal snapshot of dynamic network. Using the multiple-turns protection, we aim to address the changing of network structure and incoming rumor or virus attacks during the temporal transition in dynamic networks.

Figure 2 illustrates our proposed method in each turn, which takes a temporal snapshot of dynamic networks at time t as an input and determines the set of protected nodes. In each given turn, we determine the most critical set of nodes of the input network. A node is considered as a critical node if it is assumed that protecting such node contribute to block large-scale epidemic spreading (Chen et al. 2016; Wang et al. 2016, 2017).

Fig. 2
figure 2

Schematic illustration of the proposed method. Given a temporal snapshot of a dynamic network at time t, our proposed method selects a set of protected nodes

The main idea of our method can be described in the following key points:

1. Minimum vertex cover (MVC)

At first, we aim to find the set of the most critical nodes in the input network. Many previous studies suggest that a certain critical node criterion is best for a certain type of network structure. For instance, degree centrality is most suitable for dense and highly centralized network (Lawyer 2015; Chen et al. 2016), while betweenness centrality and connectivity are well fit for clustered networks with the existence of graph bridges (Italiano et al. 2012; Khan et al. 2015; Lawyer 2015).

We propose to consider a minimum vertex cover (MVC) as a criterion to determine set of critical nodes from networks. Given a graph G=(V,E), a vertex cover is a subset of the nodes VcV such that every edge of G is connected to Vc. Hence, this set of nodes Vc in graph G cover every edge in G. A minimum vertex cover is a vertex cover with the smallest possible number of nodes. Every graph trivially has a vertex cover where Vc=V. Figure 3a shows the vertex cover, and Fig. 3b shows the minimum vertex cover for the same graphs. The complexity of vertex cover problem is NP-Complete, and that of the minimum vertex cover problem is NP-Hard.

Fig. 3
figure 3

Vertex cover and minimum vertex cover shown on the same underlying graphs. Red colored node indicates the node in the cover because all edges are covered. a Vertex cover (not minimum). b Minimum vertex cover Figure 3:

As shown in Fig. 2, our input is a static network Gt, the observed snapshot of dynamic network at time t. We aim to completely cover all the connections in Gt, which are represented by edges, by the smallest possible size of nodes. The size definition of MVC is intuitively aligned with the limited size of the protection budget in graph protection problem. Following the definition of graph protection problem, we can show the role of MVC as the protection threshold in a network.

Theorem 1

(Protection Threshold) The protection threshold is the minimum required size of S to disconnect graph G such that no propagation may occur among nodes. Given an undirected connected graph G=(V,E), a minimum vertex cover of G is also a protection threshold of G.

Proof

A vertex cover Vc of G is a subset of the nodes VcV such that (u,v)∈EuVcvVc. A minimum vertex cover \(V_{c}^{*}\) is a Vc with the smallest size as follows:

$$ V_{c}^{*} = \underset{V_{c}}{\text{arg\ min}} |V_{c}| $$
(4)

Since all edges in graph G is covered by \(V_{c}^{*}\):

$$ (u,v) \in E \Rightarrow u \in V_{c}^{*} \vee v \in V_{c}^{*}, $$
(5)

then by removing all corresponding edges in G connected to \(V_{c}^{*}\) we get \(G^{(V_{c}^{*})} = \left (V_{c}^{*}, E^{(V_{c}^{*})}\right)\). Thus, \(G^{(V_{c}^{*})}\) has no edge, i.e., \(E^{(V_{c}^{*})} = \{\}, \left |E^{(V_{c}^{*})}\right | = 0\).

According to Definition 1 and 2, protecting the set S of nodes in G is removing all edges of G connected to S. This is a minimax function of minimizing the size S to get the maximum edges in G covered as follows:

$$ S^{*} = \underset{S}{\text{arg\ min}} |\underset{E^{(S)}}{\text{arg\ max}} |E^{(S)}|| $$
(6)

Consequently, by protecting minimum vertex cover \(V_{c}^{*}\), i.e., \(S = V_{c}^{*}\), then G(S) has no edge. Hence, a minimum vertex cover \(V_{c}^{*}\) of G is also a protection threshold of G. □

2. Top-k highest degree MVC

Let us recall that MVC is a set of nodes without any requirement of ordering. Intuitively, given k budget, selecting any k nodes from \(V_{c}^{*}\) may result in a different set of nodes. Additionally, not all of the node in MVC should have the same priority to be protected within a limited budget. We consider that the more connected a node v to its neighbors in G, the more critical node v to be protected. Hence, after obtaining MVC nodes from the input network, we reorder MVC nodes using their degree value within the input network.

Suppose that at time t we are given an input temporal snapshot graph Gt and protection budget kt. Under the constraint of limited protection budget (kt), we select top- kt MVC nodes based on their degree value within graph Gt.

3. Reinforcement learning as solution approximation

Despite the protection threshold guarantee of MVC, finding the MVC nodes of graphs is NP-Hard (Hartman and Weigt 2006). We consider a reinforcement learning (RL) approach to approximate the solution. RL approach aims to obtain an optimal solution by maximizing the cumulative rewards without given any pre-defined deterministic policies (Mnih et al. 2015; Khalil et al. 2017). Such advantage enables us to exploit the known best policy while also consider exploring unknown policies to obtain an optimal solution.

More specifically, we leverage the n-step fitted Q-Learning (Khalil et al. 2017) to obtain MVC approximation with an efficient training process and scalable implementation. Hence, our proposed methods take the advantage of n-step Q-Learning (Sutton and Barto 1998) and fitted Q-iteration (Riedmiller 2005).

We let the n-step fitted Q-Learning iteratively learn to construct a vertex cover (Vc) solution of the input network. We define the RL environment as follows:

  • State (\(\mathbb {S}\)): set of currently selected Vc nodes from input graph

  • Action (\(\mathbb {A}\)): add new node v to vertex cover set \(\mathbb {S}\)

  • Reward (\(\mathbb {R}\)): -1, as our goal is to get the minimum size of vertex cover, we set a penalty for adding a new node into Vc set.

  • Termination criteria: all edges are covered

To quantify how good is taking an action \(a \in \mathbb {A}\) given a state \(s \in \mathbb {S}\), in Q-Learning, we have the Q-Function (Watkins 1989). Q-Function evaluates the pair of state and action and maps it into a single value, called Q-Value, using the following Bellman optimality equation:

$$ Q(s,a) = r + \lambda(\max(Q(s^{\prime},a^{\prime}))) $$
(7)

where \(s \in \mathbb {S}\) is a given state, \(a \in \mathbb {A}\) is the current action, r is the current reward, λ is the discount factor of the future rewards, \(s' \in \mathbb {S}\) is the next state, and \(a' \in \mathbb {A}\) is the next action. The calculation of Q-Function is performed and updated iteratively for each possible pair of state and action. The result of all Q-Value is stored in a table, called Q-Table. The best action for a given state is indicated by the highest Q-Value.

To obtain the maximum expected cumulative reward achievable from a given pair of state and action, we can compute the optimal Q-Function, denoted as Q, using the following equation:

$$ Q^{*}(s,a) = \max \mathbb{E} [\sum_{t \geq 0} \lambda^{t} r_{t}|s_{0} = s, a_{0} = a] $$
(8)

where s0 and a0 are the initial state and action respectively, t indicates a step which consists of: observe a state, perform an action, retrieve a reward, and observe the next state.

As the number of all possible pair of state and action can be very large, calculating the Q-Value in Q-Table is not efficient. Especially, if we are handling a large-size input network, using Q-Table is computationally infeasible and resource-consuming. A non-linear function approximator can be used to estimate the optimal Q-Function in Eq. (8) such that:

$$ Q(s,a,\Psi) \approx Q^{*}(s,a) $$
(9)

where Ψ is the function parameters (weights) of our non-linear function approximator Q(s,a,Ψ). A neural network or a kernel function can be used as the non-linear function approximator of Q-Function (Sutton and Barto 1998).

Recent studies show that neural networks or convolutional neural networks achieve state-of-the-art results as function approximators (Mnih et al. 2015; Sutton and Barto 1998). The neural network architecture also speed up learning in finite problems, due to the fact that it can generalize from earlier experiences to previously unseen states (Mnih et al. 2015). In this paper, we propose a convolutional neural network as the function approximator of optimal Q-Function. Recall that in Q-Function, our input is a given state and action to obtain Q-Value as output. The state is the given input graph with currently selected Vc nodes. The actions are the possible nodes to be included into current Vc. In convolutional neural network architecture, our input should represents both of those state and action. Hence, we need a same fixed-length feature representation of the graph and each of its node. Therefore, in our construction of minimum vertex cover, our function approximator in Eq. (9) will be denoted as:

$$ \hat{Q}(h(\mathbb{S}),v,\Psi) $$
(10)

where \(h(\mathbb {S})\) and v represent the fixed-length feature representation of the state \(\mathbb {S}\) and an action of adding node v using the neural network set of weights Ψ.

4. Graph embeddings as feature-based representations

We leverage an efficient and scalable graph embedding technique, called Structure2Vec (Dai et al. 2016; Khalil et al. 2017), to embed the input graph and each of its node. This graph embedding technique computes a d-dimensional feature embedding μv for each node vV, given the current partial solution \(\mathbb {S}\).

Given a temporal snapshot graph Gt, we embed each node v by constructing a d-dimensional embedding μv. All of \(\mu _{v}^{(0)}\) entries are initialized as zero, and for every vV we update it iteratively in T iterations as follows:

$$ \mu_{v}^{(t+1)}= \text{ReLU }(\psi_{1} x_{v} + \psi_{2} \sum_{u \in N(v)} \mu_{u}^{(t)} + \psi_{3} \sum_{u \in N(v)} \text{ReLU} (\psi_{4} w(u,v))), $$
(11)

with xv is node v own tag, whether being already selected or not. Selected node will be given tag = 1, otherwise 0. N(v) is the set of neighbors of node v in graph Gt. \(\sum _{u \in N(v)} \mu _{u}^{(t)} \) is the feature of node v neighbors. w(u,v) is the neighbors’ edge weight, to consider the weighted connection in weighted graph. ψ1,ψ2,ψ3, and ψ4 are the function parameters (weights) which specified as \(\psi _{1} \in \mathbb {R}^{d}\), \(\psi _{2} \in \mathbb {R}^{dxd}\), \(\psi _{3} \in \mathbb {R}^{dxd}\), and \(\psi _{4} \in \mathbb {R}^{d}\). ReLU is the rectifier linear unit activation function applied elementwise to input where ReLU(x)=x if x>0 and 0 otherwise.

Here we will explain how to get the function \(\hat {Q}(h(\mathbb {S}_{t}),v; \Psi)\) in Eq. (10). Once the embedding μv for each node vV is calculated using Eq. (11) after T iteration, we get \(\mu _{v}^{(T)}\). The pooled embedding of the entire graph Gt is then given by

$$ \sum_{u \in V} \mu_{u}^{(T)} $$
(12)

Then we can use it to estimate the optimal Q-Function in Eq. (10) as follows:

$$ \hat{Q}(h(\mathbb{S}),v;\Psi) = \psi_{5}^{\top} \text{ReLU }\left(\text{concat} \left(\psi_{6} \sum_{u \in V} \mu_{u}^{(T)}, \psi_{7} \mu_{v}^{(T)}\right)\right), $$
(13)

being \(\sum _{u \in V} \mu _{u}^{(T)}\) is the pooled embedding of the entire graph. ψ5,ψ6, and ψ7 are the neural network parameters (weights) which specified as \(\psi _{5} \in \mathbb {R}^{2d}\), \(\psi _{6} \in \mathbb {R}^{dxd}\), and \(\psi _{7} \in \mathbb {R}^{dxd}\).

To this end, we show that the pooled embedding of the entire graph is used as a surrogate to represent the state. And the embedding of each node is used as a surrogate to represent the action. The function \(\hat {Q}(h(\mathbb {S}),v)\) is depend on the collection of seven parameters \(\Psi = \{\psi _{i}\}_{i=1}^{7}\) which are learned during the training phase and will be evaluated during the evaluation phase. Figure 4 shows the architecture illustration of neural networks used in this paper.

Fig. 4
figure 4

Architecture illustration of the neural network used in our work. Green colored shape represents the convolutional layer. Yellow colored shape represents ReLU activation function. Blue colored shape represents the fully connected layer

a. Training Phase

Algorithm 1 illustrates our proposed training phase. In each training iteration, our method returns the neural network’s set of parameters Ψ which successfully get Vc from graph G. In line 5, we specify how to select a new node by balancing exploration and exploitation. In this case, the exploration means selecting a random nodes with probability ε. The exploitation means we aim to get the maximum expected cummulative rewards, i.e. by selecting a node which maximizes the function \(\hat {Q}(h(\mathbb {S}_{t}),v; \Psi)\). \(h(\mathbb {S}_{t})\) is the embedding of state \(\mathbb {S}\) at step t. The exploration probability ε is set to decrease from 1.0 to 0.05 linear to the iteration step. To efficiently train the neural network, we perform batch processing as described in line 9.

The loss function which learned to minimize is as follows:

$$ \left(y - \hat{Q}(h(\mathbb{S}_{t}), v_{t}; \Psi)\right)^{2} $$
(14)

being \(y = \sum _{i=0}^{n-1} r(S_{t+i},v_{t+i}) + \lambda \max _{v}' \hat {Q}\left (h(\mathbb {S}_{t+n}), v'; \Psi \right)\). n is the number of step updates.

b. Evaluation Phase

Algorithm 2 illustrates the evaluation phase of our proposed method. To get the best-trained neural network’s set of parameters (weights) Ψ, we evaluate the training result against a set of given graph GD available snapshots. We use this neural network set of parameters in the testing simulation of the graph protection.

c. Testing Phase

Algorithm 3 shows the testing phase of multiple-turns graph protection strategy on dynamic networks. We are given an input snapshot of graph Gt and budget kt. Each node in Gt is embedded into d-dimensional feature vector. The size of d is equal to the embedding size during training in Algorithm 1. The minimum vertex cover of Gt is then constructed using the best-trained neural network’s set of parameters Ψ resulted from Algorithm 2. Finally, we get a set S of top- kt degree-ordered MVC nodes to be protected from the current temporal snapshot of graph Gt.

We also propose ReProtect-p method, a variant of ReProtect, which trained on the perturbed version of each available snapshot of dynamic networks. The perturbation is performed by removing edges probabilistically from the snapshot graph. Specifically, for each edge, we generate a random number. If the edge weight is smaller than the generated random number, the edge will be removed. We introduce this variant to provide more variety to the training data and avoid possible overfitting issue.

Computational complexity analysis

Based on Algorithm 3, we present the analysis of computational complexity of our proposed ReProtect method. The cost of step 1 to initialize empty set S is constant. The step 2 and 3 are to construct an approximated MVC set of graph Gt which has the complexity of O(p·M) based on the analysis by Dai et al. (2016); Khalil et al. (2017). p is the constant number of node testing steps, equals to the number of nodes divided by the number step updates in Q-Learning. M is the number of edges. In n-step Q-Learning, we update the value of each action based on the rewards of taking the sequence of n actions consecutively. n is called as the number of step updates. Suppose that the number of nodes in graph Gt is 500 and the number of step updates is 5, then p is a constant number equals to 100. One can see that p ranges from 1 to the number of nodes in graph Gt.

Getting the ordered MVC nodes in step 4 has an average O(N·logN) using QuickSort, where N is the number of nodes in Gt. Therefore, the total computational complexity of our ReProtect method is O(p·M+N·logN). The difference of ReProtect and ReProtect-p is only on training process. Similarly, we can infer that the computational complexity of ReProtect-p method is also O(p·M+N·logN).

Evaluation

In this section, we provide experimental evaluations of our proposed methods. The goal of this evaluation is to answer the following questions:

1. (Effectiveness) How effective are the proposed methods in restraining epidemic spreading in both synthetic and real-world dynamic networks? We define the measurement of effectiveness using the surviving ratio (θ) of nodes in dynamic network GD at the end of epidemics.

2. (Scalability) Are the proposed methods scalable with respect to the changing of graph size (in terms of the number of nodes) and different protection budget size (k)?

3. (Sensitivity Analysis) How is the effectiveness of our proposed methods in the different values of epidemic parameters, such as the infection rate (β) and recovery rate (δ)?

Dataset

We evaluate our proposed methods on various real-world dynamic network datasets, which summarized in Table 3.

Table 3 Statistics of dynamic network datasets
  • Dutch College dataset is a directed network of friendship ratings among 32 university freshmen (Van de Bunt et al. 1999). Each student was asked to rate the others at seven different time points.

  • Hospital dataset contains the temporal network of human contacts between patients and health-care workers in a hospital ward in Lyon, France (Vanhems et al. 2013). Data was collected in December 2010.

  • Hypertext 2009 dataset is the network of contacts of the attendees of the ACM Hypertext 2009 conference (Stehlé et al. 2011). In the network, a node represents a conference visitor, and an edge represents a face-to-face contact.

  • PrimarySchool dataset contains the temporal network of contacts between teachers and children used in the study of BMC Infectious Diseases 2014 (Gemmetto et al. 2014; Stehlé et al. 2011).

  • Highschool 2013 dataset contains the temporal network of contacts between students in a high school in Marseilles, France (Mastrandrea et al. 2015). The data was collected in December 2011 and November 2012.

  • Infectious dataset is the network of face-to-face people behavior during the Dublin Science Gallery 2009 exhibition (Isella et al. 2011).

  • Email dataset was obtained from the email communication between institution members (the core) from a large European research institution (Paranjape et al. 2017). A directed edge (u,v,t) means that person u sent an e-mail to person v at time t in the network.

Comparison methods

Recall that to the best of our knowledge, there is no previous work has been proposed to handle the multiple-turns graph protection problem on dynamic networks. Here, we investigate the performance comparison of the following methods:

  • None: simulates the condition without any protection.

  • GreedyMVC: approximates the set of MVC nodes of the input graph by greedily selects the uncovered edge with the maximum sum of degrees of its endpoints (Khalil et al. 2017). Then protects k nodes from this unordered MVC set.

  • Degree (Prakash et al. 2010): protects k highest degree nodes of the current snapshot of the dynamic network. This method represents the concept of NLDS-Degree by Prakash et al. (2010).

  • Betweenness: protects k nodes with the highest betweenness centrality of the current snapshot of the dynamic network.

  • NetShield+ (Tong et al. 2010; Chen et al. 2016; Prakash et al. 2010): aims to protect a set of k nodes considering the largest eigenvalue of adjacency matrix. This methods represents the stronger variant of eigendecomposition-based methods by Chen et al. (2016) and NLDS-EigenValue by Prakash et al. (2010).

  • GraphShield (Wijayanto and Murata 2017): protects k nodes by taking into account the role of graph connectivity and degree centrality.

  • ApproxDegree: simulates the 2-approximation algorithm to get the MVC nodes (Chakrabarti; Hartman and Weigt 2006). We add the degree-ordering nodes to this method for protecting the top-k highest degree of MVC nodes.

  • ReProtect and ReProtect-p: are our proposed methods.

Experimental setting

In the training phase, we use the embedding dimension size 64, batch size 64, embedding iteration 5 as suggested in Structure2VecFootnote 1 (Dai et al. 2016). The setting of n-step is set to 5 and learning rate as 0.0001 and number of training iteration as 100,000. These three settings are commonly applied in n-step Q-Learning (Sutton and Barto 1998). In the evaluation phase, we consider the number of evaluation iteration as 100.

For a fair comparison, unless specified otherwise, all of the methods are simulated under the same setting as follows: infection rate β=0.8, recovery rate δ=0.2, and the initial number of attacked nodes (l) equals to the protection budget (k). We simulate l=k,lt=kt, and k1=k2=⋯=kt. Random attack evaluation is employed in all experiments. The setting applies for evaluation in both SIS and SIR epidemic model.

All results presented in this section are the average of multiple simulations. Unless specified otherwise, we take the average from 100 simulations for each result. The initial condition is all nodes susceptible except the attacked ones, which are infected.

We let the epidemic spreading arrive at the stationary state before changing to the next snapshot of the network for SIS model. While for SIR model, we count the ratio of surviving nodes at the highest outbreak point, right before the final regime of epidemics as suggested by Pastor-Satorras et al. (2015). For continuity, in SIR model, we restart the epidemic spreading in the new snapshot after the final regime of epidemic spreading in the previous snapshot. Gillespie algorithm (Kiss et al. 2017) is used to simulate the epidemic spreading on networks. Additionally, we follow the time discretization method of dynamic network by Zhuang et al. (2013).

Finally, all of the experiments are performed on the same machine, Ubuntu 16.06 LTS PC with an Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz CPU and NVIDIA GTX 1080 Ti SLI GPU.

Evaluation of effectiveness on synthetic network

We evaluate the performance of all comparison methods on a synthetic network generated using Dynamic Attributed Network with Community Structure GeneratorFootnote 2 (DANCER) (Largeron et al. 2017). Due to the simplicity setting of graph protection problem, we only consider the temporal network structure of the generated network and ignore their attribute and community assignment provided by DANCER. We generate a dynamic network with 100 nodes and ten temporal snapshotsFootnote 3.

Table 4 shows the average result of 100 simulations under the constraint of protection budget k=0.25N, being N is the number of nodes in the input graph. Both of our proposed methods obtain a higher ratio of surviving nodes than other competitors. ReProtect-p achieves the highest protection effectiveness.

Table 4 Ratio of surviving nodes (θ) on synthetic network

If we vary the number of given budget k, both of our proposed methods outperform the other baseline methods as shown in Fig. 5. When the given protection budget k is too small, ReProtect and ReProtect-p exhibit competitive performances with other methods, but with an increasing k, they easily outperform other baseline methods, such as Degree, GraphShield, and Betweenness. On the other hand, ReProtect can also outperform other competing methods, even though it needs a bigger protection budget to obtain the similar performance of ReProtect-p. An introduction of more data variety using graph perturbation into training process helps our proposed method to get a better result, as in ReProtect-p.

Fig. 5
figure 5

Effectiveness evaluation on synthetic network. Both of our proposed methods (green and orange colored lines) outperform the other competitors. Higher is better. a SIS Epidemic Model. b SIR Epidemic Model

Evaluation of effectiveness on real-world networks

On real-world networks, we compare the performance of all comparison methods on seven different datasets and two different epidemic models, i.e., SIS and SIR model. Table 5 and 6 show the result of surviving nodes ratio on SIS and SIR epidemic model respectively. The results are averaged from 100 simulations under the constraint of protection budget k=0.15N, with N is the number of nodes in the input graph. Both of our proposed methods consistently reach the highest ratio of surviving nodes. Additionally, in most cases, the proposed method with more training data variety using the perturbed graph, namely ReProtect-p achieves a better result than the regular training as in ReProtect. Tables 7 and 8 present the standard deviation of the surviving nodes ratio.

Table 5 Ratio of surviving nodes (θ) on real-world networks (SIS Epidemic Model)
Table 6 Ratio of surviving nodes (θ) on real-world networks (SIR Epidemic Model)
Table 7 Standard deviation of the surviving nodes ratio (θ) (SIS Epidemic Model)
Table 8 Standard deviation of the surviving nodes ratio (θ) (SIR Epidemic Model)

To evaluate the performance comparison in different protection budget k, we vary the given k as shown in Figs. 6 and 7. Both of our proposed methods are able to outperform other competitors align with the increase given budget in all datasets, while constantly maintain competitive performance in a very small size of k. The consistency of better performance shown by our methods in many different numbers of available protection budget indicates the reliability as protection strategies.

Fig. 6
figure 6

Effectiveness evaluation on SIS epidemic model. Both of our proposed methods (green and orange colored lines) outperform the other competitors. Higher is better. a Hospital. b Hypertext 2009. c PrimarySchool. d Highschool 2013. e Infectious. f Email Figure

Fig. 7
figure 7

Effectiveness evaluation on SIR epidemic model. Both of our proposed methods (green and orange colored lines) outperform the other competitors. Higher is better. a Hospital. b Hypertext 2009. c PrimarySchool. d Highschool 2013. e Infectious. f Email Figure

We consider that the reinforcement learning is more suitable for graph protection on dynamic networks due to at least two major reasons. First, reinforcement learning approach using convolutional neural network as function approximator gives us a potential benefit to learn from previously solved MVC of network snapshot. By learning the given temporal structure of observed networks, this provides an incentive to predict future protection from previously learned actions in the same dynamic networks. The benefit of learning could not be obtained by traditional MVC approximation algorithms. Second, the nature of convolutional neural networks (CNN) provide us not only scalability in handling the large size networks which may contain up to billion nodes, but also easily parallelizable in multiple CPUs and GPUs. Here, we leverage our approach on top of recent advances in deep learning technology. Traditional MVC approximation algorithms are not specifically designed for this computationally expensive task.

Evaluation of effectiveness on the aggregate networks

According to the observations of Braha and Bar-Yam (2006; 2009), the snapshots static networks are quite different from the aggregate network itself. The aggregate network is the network obtained by ignoring time and aggregating all of the temporal edges in the dynamic network (Braha and Bar-Yam 2006; 2009). An interesting question arises, how does the multiple-turns time-based protection strategies analyzed in our proposed methods compare with the protection strategies when implemented on the aggregate network? In this subsection, we report the effectiveness evaluation on the aggregate networks of the same synthetic and real-world datasets.

Tables 9 and 10 show the result of surviving nodes ratio (θ) on SIS and SIR epidemic model respectively. The results are averaged from 100 simulations under the constraint of protection budget k=0.15N, with N is the number of nodes in the input graph. Tables 11 and 12 present the standard deviation of the surviving nodes ratio. Compared with the results in Tables 5 and 6, we found that the multiple-turns time-based strategies are beneficial and more effective than the aggregate-based strategies. The aggregate-based strategies are the protection strategies applied on the aggregate networks under the assumption that the time-aggregated networks are accessible and known a priori. We observe that the time-aggregation of all edges make the network denser thus require more nodes to be protected. The average degree of nodes in each snapshot of the network compared with that of in the aggregated network is shown in Table 3. In the aggregated network, the average degree of nodes is higher than in each network snapshot.

Table 9 Ratio of surviving nodes (θ) on the aggregate networks (SIS Epidemic Model)
Table 10 Ratio of surviving nodes (θ) on the aggregate networks (SIR Epidemic Model)
Table 11 Standard deviation of the surviving nodes ratio (θ) on the aggregate networks (SIS Epidemic Model)
Table 12 Standard deviation of the surviving nodes ratio (θ) on the aggregate networks (SIR Epidemic Model)

Evaluation of scalability

Let us recall our second evaluation goal, which aims to measure how scalable is the proposed method with respect to the changing of graph size and different k budget size. In this subsection, we report the result of scalability evaluation by investigating the computational running time of our proposed methods. Different values of k were used to evaluate the scalability in different scale of protection set.

To perform simulation by changing the number of nodes, we generate synthetic dynamic networks using Dynamic Attributed Network with Community Structure Generator (DANCER) (Largeron et al. 2017). We only consider the temporal network structure of the generated network and ignore their attribute and community assignment provided by DANCER. We generate dynamic networks with 10 temporal snapshots and the number of nodes is changed from N={100;200;300;500;1000;1500;2000}. The budget size is changed from {10;20;30;40;50}.

From Fig. 8, it can be inferred that our methods scale almost linearly with respect to the number of nodes. Hence, the proposed methods are scalable with respect to the changing of graph size, which means they are applicable for large size networks. Running our methods on graph with 2000 nodes takes less than 9 seconds. Further paralelization of neural network design can also be applied to speed up the running time.

Fig. 8
figure 8

Scalability evaluation of the proposed methods. gReProtect on SIS Model. hReProtect on SIR Model. iReProtect-p on SIS Model. jReProtect-p on SIR Model

Evaluation of sensitivity to epidemic parameters

In SIS and SIR model, epidemic parameters consist of the infection rate (β) and the recovery rate (δ). To analyze the sensitivity of our proposed methods, the effectiveness comparison with different epidemic parameters are shown in this subsection.

Prakash et al. (2011) demonstrated using empirical simulations that the ratio of infection rate over recovery rate \(\left (\frac {\beta }{\delta }\right)\) takes the role as constant dependent of epidemic threshold in various epidemic model including SIS and SIR. Epidemic threshold is an intrinsic property of a network. When the strength of the virus is greater than the epidemic threshold, then the epidemic would breakout (Prakash et al. 2011). The ratio of \(\frac {\beta }{\delta }\) is commonly called as the epidemic propagation rate (Wijayanto and Murata 2018b; Prakash et al. 2011).

We perfom simulations to confirm the effectiveness of our proposed methods using the same network dataset in Effectiveness on Synthetic Network subsection under three scenarios:

(1) Comparison of survival ratio θ when the epidemic propagation rate \(\left (\frac {\beta }{\delta }\right)\) changes

(2) Comparison of survival ratio θ when the infection rate (β) changes

(3) Comparison of survival ratio θ when the recovery rate (δ) changes

For a fair analysis and comparison, simulations are performed under a fixed protection budget(k).

Comparison of survival ratio θ when the epidemic propagation rate \(\left (\frac {\beta }{\delta }\right)\) changes

We change the ratio of \(\frac {\beta }{\delta }\) from \(\left \{\frac {0.9}{0.1}; \frac {0.8}{0.2}; \frac {0.7}{0.3}; \frac {0.6}{0.4}; \frac {0.5}{0.5}; \frac {0.4}{0.6}; \frac {0.3}{0.7}; \frac {0.2}{0.8}; \frac {0.1}{0.9}\right \}\). Figure 9 shows the comparison of survival ratio θ of all methods in SIS and SIR epidemic model. The results are averaged from 100 simulations under the fixed protection budget k=0.25N, with N is the number of nodes of the input network. In all of these conditions, both of our proposed methods obtain higher survival ratio θ than other competitors.

Fig. 9
figure 9

Evaluation of sensitivity to the epidemic propagation rate (the ratio of \(\frac {\beta }{\delta }\)). Higher is better. a SIS Model. b SIR Model

Comparison of survival ratio θ when the infection rate ( β ) changes

We change the infection rate (β) from {0.9;0.8;0.7;0.6;0.5;0.4;0.3;0.2;0.1} with fixed recovery rate (δ). Figure 10 shows the comparison of survival ratio θ of all methods. The results are presented from the average of 100 simulations with a fixed protection budget k=0.25N, where N is the number of nodes of the input network. Both of our proposed methods could achieve highest survival ratio θ regardless the value of infection rate and epidemic models.

Fig. 10
figure 10

Evaluation of sensitivity to the infection rate (β). Higher is better. a SIS Model. b SIR Model

Comparison of survival ratio θ when the recovery rate (δ) changes

We investigate the comparison of survival ratio θ by changing the recovery rate (δ) from {0.9;0.8;0.7;0.6;0.5;0.4;0.3;0.2;0.1} with fixed infection rate (β) and the protection budget k=0.25N. N is the number of nodes of the input network. As shown in Fig. 11, in SIS and SIR epidemic model, both of ReProtect and ReProtect-p methods obtain higher survival ratio θ than other competitors. The results are averaged from 100 simulations.

Fig. 11
figure 11

Evaluation of sensitivity to the recovery rate (δ). Higher is better. a SIS Model. b SIR Model

Conclusion

In this paper, we addressed the multiple-turns graph protection problem to restrain epidemic spreading on dynamic networks. The protection budget is divided into several turns and selects protected nodes based on the presently observed temporal snapshot of dynamic networks. By proving the role of minimum vertex cover (MVC) as the protection threshold of the network, we choose to protect the highest degree of MVC nodes at the size of each allocated protection budget. We introduce methods utilizing the n-step fitted Q-Learning to efficiently learn the MVC construction from the input graph under reinforcement learning approach. Graph embedding technique is incorporated as a feature-based representation of the input network states. We demonstrate the effectiveness and scalability of our methods, namely ReProtect and ReProtect-p. Extensive evaluations on synthetic and real-world network datasets show that our proposed methods outperform other baseline methods while maintaining the scalability. Further investigation of two different epidemic model simulation, i.e., SIS and SIR model, also confirm the effectiveness and scalability of our methods.

The strategy of handling graph protection problem against non-trivial targeted attacks in dynamic networks is left for our future work. Extending our methods into a multi-agent policy gradient reinforcement learning to achieve better training efficiency will also be our next consideration.