DepthRank: Exploiting Temporality to Uncover Important Network Nodes

Bastas, Nikolaos; Semertzidis, Theodoros; Daras, Petros

doi:10.1007/978-3-319-67256-4_12

Nikolaos Bastas¹⁶,
Theodoros Semertzidis¹⁶ &
Petros Daras¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10540))

Included in the following conference series:

International Conference on Social Informatics

3933 Accesses

Abstract

Identifying important network nodes is very crucial for a variety of applications, such as the spread of an idea or an innovation. The majority of the publications so far assume that the interactions between nodes are static. However, this approach neglects that real-world phenomena evolve in time. Thus, there is a need for tools and techniques which account for evolution over time. Towards this direction, we present a novel graph-based method, named DepthRank (DR) that incorporates the temporal characteristics of the underlying datasets. We compare our approach against two baseline methods and find that it efficiently recovers important nodes on three real world datasets, as indicated by the numerical simulations. Moreover, we perform our analysis on a modified version of the DBLP dataset and verify its correctness using ground truth data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Notes

1.
http://www.kdnuggets.com/2012/03/top-10-in-data-mining.html.
2.
The number of layers d within time window dt is not fixed because the layers may not be equally spaced.
3.
The exponential decay function is commonly used in many applications, however we can replace it with any other monotonically decreasing function.
4.
https://snap.stanford.edu/data/CollegeMsg.html.
5.
http://konect.uni-koblenz.de/networks/facebook-wosn-wall.
6.
https://static.aminer.org/lab-datasets/expertfinding/datasets/Data-Mining.txt.

References

Aggarwal, C.C., Lin, S., Yu, P.S.: On Influential Node Discovery in Dynamic Social Networks, pp. 636–647 (2012)
Google Scholar
Cai, Q., Sun, L., Niu, J., Liu, Y., Zhang, J.: Disseminating real-time messages in opportunistic mobile social networks: a ranking perspective. In: 2015 IEEE International Conference on Communications (ICC), pp. 3228–3233 (2015)
Google Scholar
van Eck, P.S., Jager, W., Leeflang, P.S.H.: Opinion leaders’ role in innovation diffusion: a simulation study. J. Prod. Innov. Manag. 28(2), 187–203 (2011)
Article Google Scholar
Estrada, E.: The Structure of Complex Networks: Theory and Applications. Oxford University Press, Oxford (2011)
Book Google Scholar
Gómez-Gardeñes, J., Echenique, P., Moreno, Y.: Immunization of real complex communication networks. Euro. Phys. J. B - Condens. Matter Complex Syst. 49(2), 259–264 (2006)
Article Google Scholar
Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A.: Twitter power: tweets as electronic word of mouth. J. Am. Soc. Inf. Sci. Technol. 60(11), 2169–2188 (2009)
Article Google Scholar
Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, New York, NY, USA, pp. 137–146 (2003)
Google Scholar
Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1–2), 81 (1938)
Article MATH Google Scholar
Kitsak, M., Gallos, L., Havlin, S., Liljeros, F., Muchnik, L., Stanley, H., Makse, H.: Identification of influential spreaders in complex networks. Nat. Phys. 6(11), 888–893 (2010)
Article Google Scholar
Laflin, P., Mantzaris, A.V., Ainley, F., Otley, A., Grindrod, P., Higham, D.J.: Discovering and validating influence in a dynamic online social network. Soc. Netw. Anal. Min. 3(4), 1311–1323 (2013)
Article Google Scholar
Lü, L., Chen, D., Ren, X.L., Zhang, Q.M., Zhang, Y.C., Zhou, T.: Vital nodes identification in complex networks. Phys. Rep. 650, 1–63 (2016)
Article MathSciNet Google Scholar
Magnien, C., Tarissan, F.: Time evolution of the importance of nodes in dynamic networks. In: 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1200–1207 (2015)
Google Scholar
Michalski, R., Kajdanowicz, T., Bródka, P., Kazienko, P.: Seed selection for spread of influence in social networks: temporal vs. static approach. New Gener. Comput. 32(3), 213–235 (2014)
Article Google Scholar
Morone, F., Makse, H.: Influence maximization in complex networks through optimal percolation. Nature 524(7563), 65–68 (2015)
Article Google Scholar
Newman, M.: Networks: An Introduction. Oxford University Press, New York (2010)
Book MATH Google Scholar
Rocha, L., Masuda, N.: Individual-based approach to epidemic processes on arbitrary dynamic contact networks. Scientific Reports 6 (2016)
Google Scholar
Rosas-Casals, M., Valverde, S., Solé, R.V.: Topological vulnerability of the European power grid under errors and attacks. Int. J. Bifurcat. Chaos 17(07), 2465–2475 (2007)
Article MATH Google Scholar
Saramäki, J., Moro, E.: From seconds to months: an overview of multi-scale dynamics of mobile telephone calls. Euro. Phys. J. B 88(6), 164 (2015)
Article Google Scholar
Song, G., Li, Y., Chen, X., He, X., Tang, J.: Influential node tracking on dynamic social network: an interchange greedy approach. IEEE Trans. Knowl. Data Eng. 29(2), 359–372 (2017)
Article Google Scholar
Stehlé, J., Voirin, N., Barrat, A., Cattuto, C., Isella, L., Pinton, J.F., Quaggiotto, M., van den Broeck, W., Régis, C., Lina, B., Vanhems, P.: High-resolution measurements of face-to-face contact patterns in a primary school. PLoS ONE 6(8), e23176 (2011)
Article Google Scholar
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, New York, NY, USA, pp. 990–998 (2008)
Google Scholar
Valdano, E., Ferreri, L., Poletto, C., Colizza, V.: Analytical computation of the epidemic threshold on temporal networks. Phys. Rev. X 5, 021005 (2015)
Google Scholar
Vestergaard, C., Génois, M.: Temporal gillespie algorithm: fast simulation of contagion processes on time-varying networks. PLoS Comput. Biol. 11(10), e1004579 (2015)
Article Google Scholar
Viswanath, B., Mislove, A., Cha, M., Gummadi, K.P.: On the evolution of user interaction in Facebook. In: Proceedings of the 2nd ACM Workshop on Online Social Networks, WOSN 2009, New York, NY, USA, pp. 37–42 (2009)
Google Scholar
Zhuang, H., Sun, Y., Tang, J., Zhang, J., Sun, X.: Influence maximization in dynamic social networks. In: 2013 IEEE 13th International Conference on Data Mining, pp. 1313–1318 (2013)
Google Scholar

Download references

Acknowledgement

This research was performed under the EU’s project “Trusted, Citizen - LEA collaboration over sOcial Networks(TRILLION)” (grant agreement No 653256).

Author information

Authors and Affiliations

Centre for Research and Technology Hellas, Thessaloniki, Greece
Nikolaos Bastas, Theodoros Semertzidis & Petros Daras

Authors

Nikolaos Bastas
View author publications
You can also search for this author in PubMed Google Scholar
Theodoros Semertzidis
View author publications
You can also search for this author in PubMed Google Scholar
Petros Daras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikolaos Bastas .

Editor information

Editors and Affiliations

Indiana University, Bloomington, Indiana, USA
Giovanni Luca Ciampaglia
University of Washington, Seattle, Washington, USA
Afra Mashhadi
University of Oxford, Oxford, United Kingdom
Taha Yasseri

Appendices

Appendix

A Description of Method

In time-varying networks, nodes (i.e. users in a chat forum) interact with each other at given timestamps. These interactions are not homogeneously distributed; they are denser or sparser in certain time periods, depending on various factors (i.e. type of media, occurrence of an important event etc.). In the following, we proceed by aggregating these interactions using appropriate time windows with the exact values per dataset presented in Sect. 4. We denote by t the time step in the aggregated data.

After the aggregation, data are transformed into a set of directed unweighted graphs $G_i$, $i=1...M$, which are called layers. A part of them is shown in Fig. 8 (left). In this setting, there are two types of links: intra-links that connect two nodes in the same layer and inter-links that connect identical nodes in consecutive layers.

Next, we present the method for ranking important network nodes by evaluating their “influence score” S. This is a two level procedure, incorporating (a) the calculation of the “influence score” update for each node in consecutive time steps and (b) the application of a monotonically decreasing function to account for the influence decay as time passes.

We denote with $\varDelta H (s,G_i)$ the “influence score” update which refers to a node s residing in layer $G_i$. It is calculated as follows: starting from $G_i$, we specify the number of subsequent layers d within the time range $[t_i, t_i + dt]$, with dt the (fixed) length of the time window and $t_i$ the timestep corresponding to layer $G_i$.^{Footnote 2} Then, we construct every possible path from node s in layer $G_i$ to any node w in layer $G_{i+d}$, which passes through nodes in the intermediate layers (Fig. 8(right)). Note that we take into account both the inter and intra-links during path construction.

Table 1. List of notations used in the text

Full size table

Each node w in layer $G_{i+d}$ has $r^{out}_{(w,G_{i+d})}$ outgoing links and $r^{in}_{(w,G_{i+d-1})}$ incoming links in layer $G_{i+d-1}$. We introduce the following relation:

$$ H(w,G_{i+d}|s,G_i) = {\left\{ \begin{array}{ll} r^{in}_{(w,G_{i+d-1})} \cdot r^{out}_{(w,G_{i+d})} , \hbox {if} \, dist((w,G_{i+d}),(s,G_{i}))<2 \cdot d +1 \\ 0 ,\hbox {otherwise} \end{array}\right. } $$

which couples path diversity ($r^{in}_{(w,G_{i+d-1})}$) and transmission efficiency ($r^{out}_{(w,G_{i+d})}$), to denote the portion of the update $\varDelta H$ attributed to node w. Thus, $\varDelta H (s,G_{i})$ is given by:

$$\begin{aligned} \varDelta H(s,G_{i}) = \sum _{w \in G_{i+d}}{H(w,G_{i+d}|s,G_{i})} \end{aligned}$$

Referring to Fig. 8 (right), we have at layer $i+3$ (i is the top layer): $H(2,G_{i+3}|1,G_{i}) = 2 \cdot 1 = 2$, $H(1,G_{i+3}|1,G_{i}) = 2 \cdot 2 = 4$, $H(3,G_{i+3}|1,G_{i}) = 2 \cdot 1 = 2$, $H(4,G_{i+3}|1,G_{i}) = 1 \cdot 1 = 1$, $H(5,G_{i+3}|1,G_{i}) = 2 \cdot 1 = 2$ and $H(6,G_{i+3}|1,G_{i}) = 1 \cdot 2 = 2$. Thus, for node $s=1$ in layer $G_i$, the update is $\varDelta H(1,G_{i}) = 13$.

The “influence score” $S(s,G_{i})$ of node s up to layer $G_i$ is the sum of the update $\varDelta H(s,G_{i})$ and its previous value $S(s,G_{j})$ in layer $G_j$ - which corresponds to the time step $t_j^s$ of its most recent update - weighted using a forgetting function $g(t_i-t_j^s)$ to account for aging effects:

$$\begin{aligned} S(s,G_i) = \varDelta H(s,G_{i}) + S(s,G_{j}) \cdot g(t_i-t_j^s) \end{aligned}$$

In the following, we consider three different variants: no forgetting mechanism ($g(t_i-t_j^s)=1$), forgetting mechanism ($g(t_i-t_j^s)=e^{-(t_i-t_j^s)}$) and forgetting mechanism with normalization ($g(t_i-t_j^s)=e^{-(t_i-t_j^s)/T}$), with T the maximum time step of the aggregated dataset.^{Footnote 3} Thus, the score $S(s,G_i)$, can be formulated as follows:

No forgetting mechanism
$$\begin{aligned} S(s,G_i) = \varDelta H(s,G_i) + S(s,G_j) \end{aligned}$$
(4)
Forgetting mechanism:
$$\begin{aligned} S(s,G_i) = \varDelta H(s,G_i) + S(s,G_j) \cdot e^{-(t_i-t_j^s)} \end{aligned}$$
(5)
Forgetting mechanism with normalization:
$$\begin{aligned} S(s,G_i) = \varDelta H(s,G_i) + S(s,G_j) \cdot e^{-(t_i-t_j^s)/T} \end{aligned}$$
(6)

The procedure ends when we reach the last layer for which $t \le T-dt$. The overall “influence score” S for node s in the underlying dataset is given by (remember that $G_j$ denotes the layer of the most recent update):

No forgetting mechanism:
$$\begin{aligned} S(s) = S(s,G_j) \end{aligned}$$
(7)
Forgetting mechanism:
$$\begin{aligned} S(s) = S(s,G_j) \cdot e^{-(T-t^s_j)} \end{aligned}$$
(8)
Forgetting mechanism with normalization:
$$\begin{aligned} S(s) = S(s,G_j) \cdot e^{-(T-t^s_j)/T} \end{aligned}$$
(9)

We finally rank the nodes in descending order according to their S value. In Algorithm 1, we provide the pseudo-code for our method, including the three variants defined previously: DepthRank (DR), where we do not impose any forgetting mechanism (Eqs. (4) and (7)), DepthRank with forgetting mechanism (DR-F) (Eqs. (5) and (8)) and DepthRank where we impose normalization (DR-NF) (Eqs. (6) and (9)).

B Baseline Methods and Evaluation Metrics

In this section, we present the baseline methods along with the evaluation metrics. For the former, we have chosen k-shell (kS) and Collective Influence (CI) methods because they are widely employed in the context of graph-based techniques.

k -shell is broadly used for ranking purposes. Given a graph G, it proceeds as follows: every isolated node is considered as being in the 0-shell. For the rest of the nodes with connectivity $k \ge 1$, one first removes the links from every node with $k = 1$. The residual graph may consist of nodes with connectivity $k=1$ ; thus, the same procedure is repeated until no nodes with $k \le 1$ remain. We say that these nodes constitute the 1-shell. The rest of the shells are uncovered in the same way. The nodes belonging to the highest shells are the most important [9].

Collective Influence is based on optimal percolation. According to [14], it takes the form:

(10)

where, $r_i$ is the degree of the node i, the front of a sphere centered at the i-th node with radius l (in terms of shortest path distance). We follow the suggestions of the authors in [14] and set $l=2$ and 3 in the calculations.

To assess the spreading ability of the important network nodes uncovered by each method, we apply a temporal implementation of the SIR model, called Temporal Gillespie Algorithm [23] (we choose the Poisson homogeneous version), to account for the time-varying interactions in real-world networks. We have slightly modified the process in order to start from a set of initially infected nodes rather than a single one. These sets coincide with the ordered sets identified by the methods used. The efficiency of spreading is evaluated by using the maximum number of infected nodes within the whole dataset, $I_{max} = \max \limits _{1 \le t \le T}{I(t)}$. $\beta $ and $\gamma $ are the infection and recovery rates, respectively. Except for the case of the small CollegeMsg dataset, where we have calculated the epidemic threshold for a fixed $\gamma $ value as in [22], for the rest of the datasets this was done heuristically, by searching for those values of $\beta $ and $\gamma $ for which considerable amount of spreading is observed. The exact values are displayed in the figure captions in Sect. 4.

We are also interested in the correct ordering of the ranking methods. A suitable measure is the $\tau $-Kendall [8], which evaluates the concordance of two ordered lists. We exploit it in the following way:

We collect all the nodes tagged as important from every method o. We calculate $I_{max}$ for each one and rank them according to this value. This will be the overall “ground-truth” ranking list, L.
We select the m first nodes according to each method ranking, $R_o$, and find their position in L. This will be the method specific “ground-truth” ranking list, $L_{o}$.
We calculate the $\tau $-Kendall between $L_o$ and $R_o$.

This evaluation is applied for various lengths m of the most important nodes lists (see respective figures in Sect. 4).

C Datasets

We have performed the experiments using the following publicly available real world datasets:

CollegeMsg ^{Footnote 4} : It is a small temporal network corresponding to a private messaging facility at the University of California, Irvine. It consists of 1899 users and 59835 temporal edges. The timestamps are in UNIX time (seconds).
Facebook ^{Footnote 5} : It consists of 855542 facebook wall posts between 45813 unique users. The timestamps are in UNIX time (seconds).
Data Mining (DM) temporal citation network: We have used the DBLP dataset of [21] which consists of all the papers in computer science up to 2010. The timestamps are in years.

The preparation of the DM dataset was performed as follows: we have selected the papers that contain one or more authors from a list of DM experts^{Footnote 6}. Each article record contains a set of indexes to other articles which cite the current one. We use this information to create the connections between the authors citing a paper and those of the cited paper per year of citation. For example, assume that we pick a record of the form: (title1, author1, author2, author3, 2009) which cites among other papers the following: (title2, cauthor1, cauthor2, 1993). We formulate the authors temporal interaction patterns as: (cauthor1, author1, 2009), (cauthor1, author2, 2009), (cauthor1, author3, 2009), (cauthor2, author1, 2009), (cauthor2, author2, 2009), (cauthor2, author3, 2009). Each tuple means that the first author influenced the second author in year 2009. In this way, we construct the temporal citation network for the years 1993–2010.

Each of the previous datasets comes into the form (u, v, t), where u is the sender of a message or the author being cited and v the receiver of the message or the author citing an article at time t. In order to apply k-shell and CI methods, we aggregate the interactions in a static unweighted graph.

D Detailed Results for $\tau $-Kendall

In this section, we provide more detailed plots for the $\tau $-Kendall, by incorporating all the methods and parameters used (Figs. 9, 10 and 11).

E Comparison with Ground-Truth in DM Temporal Citation Network

In Table 2, we present the number of important network nodes identified by each method that are common with those in the ground-truth list for the case of DM temporal citation network (see Sect. 4), as m increases.

Table 2. Number of important authors identified by the methods used compared to the top-10 DM authors list (see main text) for the DM temporal citation network.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bastas, N., Semertzidis, T., Daras, P. (2017). DepthRank: Exploiting Temporality to Uncover Important Network Nodes. In: Ciampaglia, G., Mashhadi, A., Yasseri, T. (eds) Social Informatics. SocInfo 2017. Lecture Notes in Computer Science(), vol 10540. Springer, Cham. https://doi.org/10.1007/978-3-319-67256-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-67256-4_12
Published: 02 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67255-7
Online ISBN: 978-3-319-67256-4
eBook Packages: Computer ScienceComputer Science (R0)