Abstract
Identifying important network nodes is very crucial for a variety of applications, such as the spread of an idea or an innovation. The majority of the publications so far assume that the interactions between nodes are static. However, this approach neglects that real-world phenomena evolve in time. Thus, there is a need for tools and techniques which account for evolution over time. Towards this direction, we present a novel graph-based method, named DepthRank (DR) that incorporates the temporal characteristics of the underlying datasets. We compare our approach against two baseline methods and find that it efficiently recovers important nodes on three real world datasets, as indicated by the numerical simulations. Moreover, we perform our analysis on a modified version of the DBLP dataset and verify its correctness using ground truth data.
Notes
- 1.
- 2.
The number of layers d within time window dt is not fixed because the layers may not be equally spaced.
- 3.
The exponential decay function is commonly used in many applications, however we can replace it with any other monotonically decreasing function.
- 4.
- 5.
- 6.
References
Aggarwal, C.C., Lin, S., Yu, P.S.: On Influential Node Discovery in Dynamic Social Networks, pp. 636–647 (2012)
Cai, Q., Sun, L., Niu, J., Liu, Y., Zhang, J.: Disseminating real-time messages in opportunistic mobile social networks: a ranking perspective. In: 2015 IEEE International Conference on Communications (ICC), pp. 3228–3233 (2015)
van Eck, P.S., Jager, W., Leeflang, P.S.H.: Opinion leaders’ role in innovation diffusion: a simulation study. J. Prod. Innov. Manag. 28(2), 187–203 (2011)
Estrada, E.: The Structure of Complex Networks: Theory and Applications. Oxford University Press, Oxford (2011)
Gómez-Gardeñes, J., Echenique, P., Moreno, Y.: Immunization of real complex communication networks. Euro. Phys. J. B - Condens. Matter Complex Syst. 49(2), 259–264 (2006)
Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A.: Twitter power: tweets as electronic word of mouth. J. Am. Soc. Inf. Sci. Technol. 60(11), 2169–2188 (2009)
Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, New York, NY, USA, pp. 137–146 (2003)
Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1–2), 81 (1938)
Kitsak, M., Gallos, L., Havlin, S., Liljeros, F., Muchnik, L., Stanley, H., Makse, H.: Identification of influential spreaders in complex networks. Nat. Phys. 6(11), 888–893 (2010)
Laflin, P., Mantzaris, A.V., Ainley, F., Otley, A., Grindrod, P., Higham, D.J.: Discovering and validating influence in a dynamic online social network. Soc. Netw. Anal. Min. 3(4), 1311–1323 (2013)
Lü, L., Chen, D., Ren, X.L., Zhang, Q.M., Zhang, Y.C., Zhou, T.: Vital nodes identification in complex networks. Phys. Rep. 650, 1–63 (2016)
Magnien, C., Tarissan, F.: Time evolution of the importance of nodes in dynamic networks. In: 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1200–1207 (2015)
Michalski, R., Kajdanowicz, T., Bródka, P., Kazienko, P.: Seed selection for spread of influence in social networks: temporal vs. static approach. New Gener. Comput. 32(3), 213–235 (2014)
Morone, F., Makse, H.: Influence maximization in complex networks through optimal percolation. Nature 524(7563), 65–68 (2015)
Newman, M.: Networks: An Introduction. Oxford University Press, New York (2010)
Rocha, L., Masuda, N.: Individual-based approach to epidemic processes on arbitrary dynamic contact networks. Scientific Reports 6 (2016)
Rosas-Casals, M., Valverde, S., Solé, R.V.: Topological vulnerability of the European power grid under errors and attacks. Int. J. Bifurcat. Chaos 17(07), 2465–2475 (2007)
Saramäki, J., Moro, E.: From seconds to months: an overview of multi-scale dynamics of mobile telephone calls. Euro. Phys. J. B 88(6), 164 (2015)
Song, G., Li, Y., Chen, X., He, X., Tang, J.: Influential node tracking on dynamic social network: an interchange greedy approach. IEEE Trans. Knowl. Data Eng. 29(2), 359–372 (2017)
Stehlé, J., Voirin, N., Barrat, A., Cattuto, C., Isella, L., Pinton, J.F., Quaggiotto, M., van den Broeck, W., Régis, C., Lina, B., Vanhems, P.: High-resolution measurements of face-to-face contact patterns in a primary school. PLoS ONE 6(8), e23176 (2011)
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, New York, NY, USA, pp. 990–998 (2008)
Valdano, E., Ferreri, L., Poletto, C., Colizza, V.: Analytical computation of the epidemic threshold on temporal networks. Phys. Rev. X 5, 021005 (2015)
Vestergaard, C., Génois, M.: Temporal gillespie algorithm: fast simulation of contagion processes on time-varying networks. PLoS Comput. Biol. 11(10), e1004579 (2015)
Viswanath, B., Mislove, A., Cha, M., Gummadi, K.P.: On the evolution of user interaction in Facebook. In: Proceedings of the 2nd ACM Workshop on Online Social Networks, WOSN 2009, New York, NY, USA, pp. 37–42 (2009)
Zhuang, H., Sun, Y., Tang, J., Zhang, J., Sun, X.: Influence maximization in dynamic social networks. In: 2013 IEEE 13th International Conference on Data Mining, pp. 1313–1318 (2013)
Acknowledgement
This research was performed under the EU’s project “Trusted, Citizen - LEA collaboration over sOcial Networks(TRILLION)” (grant agreement No 653256).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix
A Description of Method
In time-varying networks, nodes (i.e. users in a chat forum) interact with each other at given timestamps. These interactions are not homogeneously distributed; they are denser or sparser in certain time periods, depending on various factors (i.e. type of media, occurrence of an important event etc.). In the following, we proceed by aggregating these interactions using appropriate time windows with the exact values per dataset presented in Sect. 4. We denote by t the time step in the aggregated data.
After the aggregation, data are transformed into a set of directed unweighted graphs \(G_i\), \(i=1...M\), which are called layers. A part of them is shown in Fig. 8 (left). In this setting, there are two types of links: intra-links that connect two nodes in the same layer and inter-links that connect identical nodes in consecutive layers.
Next, we present the method for ranking important network nodes by evaluating their “influence score” S. This is a two level procedure, incorporating (a) the calculation of the “influence score” update for each node in consecutive time steps and (b) the application of a monotonically decreasing function to account for the influence decay as time passes.
We denote with \(\varDelta H (s,G_i)\) the “influence score” update which refers to a node s residing in layer \(G_i\). It is calculated as follows: starting from \(G_i\), we specify the number of subsequent layers d within the time range \([t_i, t_i + dt]\), with dt the (fixed) length of the time window and \(t_i\) the timestep corresponding to layer \(G_i\).Footnote 2 Then, we construct every possible path from node s in layer \(G_i\) to any node w in layer \(G_{i+d}\), which passes through nodes in the intermediate layers (Fig. 8(right)). Note that we take into account both the inter and intra-links during path construction.
Each node w in layer \(G_{i+d}\) has \(r^{out}_{(w,G_{i+d})}\) outgoing links and \(r^{in}_{(w,G_{i+d-1})}\) incoming links in layer \(G_{i+d-1}\). We introduce the following relation:
which couples path diversity (\(r^{in}_{(w,G_{i+d-1})}\)) and transmission efficiency (\(r^{out}_{(w,G_{i+d})}\)), to denote the portion of the update \(\varDelta H\) attributed to node w. Thus, \(\varDelta H (s,G_{i})\) is given by:
Referring to Fig. 8 (right), we have at layer \(i+3\) (i is the top layer): \(H(2,G_{i+3}|1,G_{i}) = 2 \cdot 1 = 2\), \(H(1,G_{i+3}|1,G_{i}) = 2 \cdot 2 = 4\), \(H(3,G_{i+3}|1,G_{i}) = 2 \cdot 1 = 2\), \(H(4,G_{i+3}|1,G_{i}) = 1 \cdot 1 = 1\), \(H(5,G_{i+3}|1,G_{i}) = 2 \cdot 1 = 2\) and \(H(6,G_{i+3}|1,G_{i}) = 1 \cdot 2 = 2\). Thus, for node \(s=1\) in layer \(G_i\), the update is \(\varDelta H(1,G_{i}) = 13\).
The “influence score” \(S(s,G_{i})\) of node s up to layer \(G_i\) is the sum of the update \(\varDelta H(s,G_{i})\) and its previous value \(S(s,G_{j})\) in layer \(G_j\) - which corresponds to the time step \(t_j^s\) of its most recent update - weighted using a forgetting function \(g(t_i-t_j^s)\) to account for aging effects:
In the following, we consider three different variants: no forgetting mechanism (\(g(t_i-t_j^s)=1\)), forgetting mechanism (\(g(t_i-t_j^s)=e^{-(t_i-t_j^s)}\)) and forgetting mechanism with normalization (\(g(t_i-t_j^s)=e^{-(t_i-t_j^s)/T}\)), with T the maximum time step of the aggregated dataset.Footnote 3 Thus, the score \(S(s,G_i)\), can be formulated as follows:
-
No forgetting mechanism
$$\begin{aligned} S(s,G_i) = \varDelta H(s,G_i) + S(s,G_j) \end{aligned}$$(4) -
Forgetting mechanism:
$$\begin{aligned} S(s,G_i) = \varDelta H(s,G_i) + S(s,G_j) \cdot e^{-(t_i-t_j^s)} \end{aligned}$$(5) -
Forgetting mechanism with normalization:
$$\begin{aligned} S(s,G_i) = \varDelta H(s,G_i) + S(s,G_j) \cdot e^{-(t_i-t_j^s)/T} \end{aligned}$$(6)
The procedure ends when we reach the last layer for which \(t \le T-dt\). The overall “influence score” S for node s in the underlying dataset is given by (remember that \(G_j\) denotes the layer of the most recent update):
-
No forgetting mechanism:
$$\begin{aligned} S(s) = S(s,G_j) \end{aligned}$$(7) -
Forgetting mechanism:
$$\begin{aligned} S(s) = S(s,G_j) \cdot e^{-(T-t^s_j)} \end{aligned}$$(8) -
Forgetting mechanism with normalization:
$$\begin{aligned} S(s) = S(s,G_j) \cdot e^{-(T-t^s_j)/T} \end{aligned}$$(9)
We finally rank the nodes in descending order according to their S value. In Algorithm 1, we provide the pseudo-code for our method, including the three variants defined previously: DepthRank (DR), where we do not impose any forgetting mechanism (Eqs. (4) and (7)), DepthRank with forgetting mechanism (DR-F) (Eqs. (5) and (8)) and DepthRank where we impose normalization (DR-NF) (Eqs. (6) and (9)).
B Baseline Methods and Evaluation Metrics
In this section, we present the baseline methods along with the evaluation metrics. For the former, we have chosen k-shell (kS) and Collective Influence (CI) methods because they are widely employed in the context of graph-based techniques.
k -shell is broadly used for ranking purposes. Given a graph G, it proceeds as follows: every isolated node is considered as being in the 0-shell. For the rest of the nodes with connectivity \(k \ge 1\), one first removes the links from every node with \(k = 1\). The residual graph may consist of nodes with connectivity \(k=1\) ; thus, the same procedure is repeated until no nodes with \(k \le 1\) remain. We say that these nodes constitute the 1-shell. The rest of the shells are uncovered in the same way. The nodes belonging to the highest shells are the most important [9].
Collective Influence is based on optimal percolation. According to [14], it takes the form:
where, \(r_i\) is the degree of the node i, the front of a sphere centered at the i-th node with radius l (in terms of shortest path distance). We follow the suggestions of the authors in [14] and set \(l=2\) and 3 in the calculations.
To assess the spreading ability of the important network nodes uncovered by each method, we apply a temporal implementation of the SIR model, called Temporal Gillespie Algorithm [23] (we choose the Poisson homogeneous version), to account for the time-varying interactions in real-world networks. We have slightly modified the process in order to start from a set of initially infected nodes rather than a single one. These sets coincide with the ordered sets identified by the methods used. The efficiency of spreading is evaluated by using the maximum number of infected nodes within the whole dataset, \(I_{max} = \max \limits _{1 \le t \le T}{I(t)}\). \(\beta \) and \(\gamma \) are the infection and recovery rates, respectively. Except for the case of the small CollegeMsg dataset, where we have calculated the epidemic threshold for a fixed \(\gamma \) value as in [22], for the rest of the datasets this was done heuristically, by searching for those values of \(\beta \) and \(\gamma \) for which considerable amount of spreading is observed. The exact values are displayed in the figure captions in Sect. 4.
We are also interested in the correct ordering of the ranking methods. A suitable measure is the \(\tau \)-Kendall [8], which evaluates the concordance of two ordered lists. We exploit it in the following way:
-
We collect all the nodes tagged as important from every method o. We calculate \(I_{max}\) for each one and rank them according to this value. This will be the overall “ground-truth” ranking list, L.
-
We select the m first nodes according to each method ranking, \(R_o\), and find their position in L. This will be the method specific “ground-truth” ranking list, \(L_{o}\).
-
We calculate the \(\tau \)-Kendall between \(L_o\) and \(R_o\).
This evaluation is applied for various lengths m of the most important nodes lists (see respective figures in Sect. 4).
C Datasets
We have performed the experiments using the following publicly available real world datasets:
-
CollegeMsg Footnote 4 : It is a small temporal network corresponding to a private messaging facility at the University of California, Irvine. It consists of 1899 users and 59835 temporal edges. The timestamps are in UNIX time (seconds).
-
Facebook Footnote 5 : It consists of 855542 facebook wall posts between 45813 unique users. The timestamps are in UNIX time (seconds).
-
Data Mining (DM) temporal citation network: We have used the DBLP dataset of [21] which consists of all the papers in computer science up to 2010. The timestamps are in years.
The preparation of the DM dataset was performed as follows: we have selected the papers that contain one or more authors from a list of DM expertsFootnote 6. Each article record contains a set of indexes to other articles which cite the current one. We use this information to create the connections between the authors citing a paper and those of the cited paper per year of citation. For example, assume that we pick a record of the form: (title1, author1, author2, author3, 2009) which cites among other papers the following: (title2, cauthor1, cauthor2, 1993). We formulate the authors temporal interaction patterns as: (cauthor1, author1, 2009), (cauthor1, author2, 2009), (cauthor1, author3, 2009), (cauthor2, author1, 2009), (cauthor2, author2, 2009), (cauthor2, author3, 2009). Each tuple means that the first author influenced the second author in year 2009. In this way, we construct the temporal citation network for the years 1993–2010.
Each of the previous datasets comes into the form (u, v, t), where u is the sender of a message or the author being cited and v the receiver of the message or the author citing an article at time t. In order to apply k-shell and CI methods, we aggregate the interactions in a static unweighted graph.
D Detailed Results for \(\tau \)-Kendall
In this section, we provide more detailed plots for the \(\tau \)-Kendall, by incorporating all the methods and parameters used (Figs. 9, 10 and 11).
E Comparison with Ground-Truth in DM Temporal Citation Network
In Table 2, we present the number of important network nodes identified by each method that are common with those in the ground-truth list for the case of DM temporal citation network (see Sect. 4), as m increases.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Bastas, N., Semertzidis, T., Daras, P. (2017). DepthRank: Exploiting Temporality to Uncover Important Network Nodes. In: Ciampaglia, G., Mashhadi, A., Yasseri, T. (eds) Social Informatics. SocInfo 2017. Lecture Notes in Computer Science(), vol 10540. Springer, Cham. https://doi.org/10.1007/978-3-319-67256-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-67256-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67255-7
Online ISBN: 978-3-319-67256-4
eBook Packages: Computer ScienceComputer Science (R0)