Skip to main content

DepthRank: Exploiting Temporality to Uncover Important Network Nodes

  • Conference paper
  • First Online:
Social Informatics (SocInfo 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10540))

Included in the following conference series:

  • 3933 Accesses

Abstract

Identifying important network nodes is very crucial for a variety of applications, such as the spread of an idea or an innovation. The majority of the publications so far assume that the interactions between nodes are static. However, this approach neglects that real-world phenomena evolve in time. Thus, there is a need for tools and techniques which account for evolution over time. Towards this direction, we present a novel graph-based method, named DepthRank (DR) that incorporates the temporal characteristics of the underlying datasets. We compare our approach against two baseline methods and find that it efficiently recovers important nodes on three real world datasets, as indicated by the numerical simulations. Moreover, we perform our analysis on a modified version of the DBLP dataset and verify its correctness using ground truth data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    http://www.kdnuggets.com/2012/03/top-10-in-data-mining.html.

  2. 2.

    The number of layers d within time window dt is not fixed because the layers may not be equally spaced.

  3. 3.

    The exponential decay function is commonly used in many applications, however we can replace it with any other monotonically decreasing function.

  4. 4.

    https://snap.stanford.edu/data/CollegeMsg.html.

  5. 5.

    http://konect.uni-koblenz.de/networks/facebook-wosn-wall.

  6. 6.

    https://static.aminer.org/lab-datasets/expertfinding/datasets/Data-Mining.txt.

References

  1. Aggarwal, C.C., Lin, S., Yu, P.S.: On Influential Node Discovery in Dynamic Social Networks, pp. 636–647 (2012)

    Google Scholar 

  2. Cai, Q., Sun, L., Niu, J., Liu, Y., Zhang, J.: Disseminating real-time messages in opportunistic mobile social networks: a ranking perspective. In: 2015 IEEE International Conference on Communications (ICC), pp. 3228–3233 (2015)

    Google Scholar 

  3. van Eck, P.S., Jager, W., Leeflang, P.S.H.: Opinion leaders’ role in innovation diffusion: a simulation study. J. Prod. Innov. Manag. 28(2), 187–203 (2011)

    Article  Google Scholar 

  4. Estrada, E.: The Structure of Complex Networks: Theory and Applications. Oxford University Press, Oxford (2011)

    Book  Google Scholar 

  5. Gómez-Gardeñes, J., Echenique, P., Moreno, Y.: Immunization of real complex communication networks. Euro. Phys. J. B - Condens. Matter Complex Syst. 49(2), 259–264 (2006)

    Article  Google Scholar 

  6. Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A.: Twitter power: tweets as electronic word of mouth. J. Am. Soc. Inf. Sci. Technol. 60(11), 2169–2188 (2009)

    Article  Google Scholar 

  7. Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of influence through a social network. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2003, New York, NY, USA, pp. 137–146 (2003)

    Google Scholar 

  8. Kendall, M.G.: A new measure of rank correlation. Biometrika 30(1–2), 81 (1938)

    Article  MATH  Google Scholar 

  9. Kitsak, M., Gallos, L., Havlin, S., Liljeros, F., Muchnik, L., Stanley, H., Makse, H.: Identification of influential spreaders in complex networks. Nat. Phys. 6(11), 888–893 (2010)

    Article  Google Scholar 

  10. Laflin, P., Mantzaris, A.V., Ainley, F., Otley, A., Grindrod, P., Higham, D.J.: Discovering and validating influence in a dynamic online social network. Soc. Netw. Anal. Min. 3(4), 1311–1323 (2013)

    Article  Google Scholar 

  11. Lü, L., Chen, D., Ren, X.L., Zhang, Q.M., Zhang, Y.C., Zhou, T.: Vital nodes identification in complex networks. Phys. Rep. 650, 1–63 (2016)

    Article  MathSciNet  Google Scholar 

  12. Magnien, C., Tarissan, F.: Time evolution of the importance of nodes in dynamic networks. In: 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1200–1207 (2015)

    Google Scholar 

  13. Michalski, R., Kajdanowicz, T., Bródka, P., Kazienko, P.: Seed selection for spread of influence in social networks: temporal vs. static approach. New Gener. Comput. 32(3), 213–235 (2014)

    Article  Google Scholar 

  14. Morone, F., Makse, H.: Influence maximization in complex networks through optimal percolation. Nature 524(7563), 65–68 (2015)

    Article  Google Scholar 

  15. Newman, M.: Networks: An Introduction. Oxford University Press, New York (2010)

    Book  MATH  Google Scholar 

  16. Rocha, L., Masuda, N.: Individual-based approach to epidemic processes on arbitrary dynamic contact networks. Scientific Reports 6 (2016)

    Google Scholar 

  17. Rosas-Casals, M., Valverde, S., Solé, R.V.: Topological vulnerability of the European power grid under errors and attacks. Int. J. Bifurcat. Chaos 17(07), 2465–2475 (2007)

    Article  MATH  Google Scholar 

  18. Saramäki, J., Moro, E.: From seconds to months: an overview of multi-scale dynamics of mobile telephone calls. Euro. Phys. J. B 88(6), 164 (2015)

    Article  Google Scholar 

  19. Song, G., Li, Y., Chen, X., He, X., Tang, J.: Influential node tracking on dynamic social network: an interchange greedy approach. IEEE Trans. Knowl. Data Eng. 29(2), 359–372 (2017)

    Article  Google Scholar 

  20. Stehlé, J., Voirin, N., Barrat, A., Cattuto, C., Isella, L., Pinton, J.F., Quaggiotto, M., van den Broeck, W., Régis, C., Lina, B., Vanhems, P.: High-resolution measurements of face-to-face contact patterns in a primary school. PLoS ONE 6(8), e23176 (2011)

    Article  Google Scholar 

  21. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, New York, NY, USA, pp. 990–998 (2008)

    Google Scholar 

  22. Valdano, E., Ferreri, L., Poletto, C., Colizza, V.: Analytical computation of the epidemic threshold on temporal networks. Phys. Rev. X 5, 021005 (2015)

    Google Scholar 

  23. Vestergaard, C., Génois, M.: Temporal gillespie algorithm: fast simulation of contagion processes on time-varying networks. PLoS Comput. Biol. 11(10), e1004579 (2015)

    Article  Google Scholar 

  24. Viswanath, B., Mislove, A., Cha, M., Gummadi, K.P.: On the evolution of user interaction in Facebook. In: Proceedings of the 2nd ACM Workshop on Online Social Networks, WOSN 2009, New York, NY, USA, pp. 37–42 (2009)

    Google Scholar 

  25. Zhuang, H., Sun, Y., Tang, J., Zhang, J., Sun, X.: Influence maximization in dynamic social networks. In: 2013 IEEE 13th International Conference on Data Mining, pp. 1313–1318 (2013)

    Google Scholar 

Download references

Acknowledgement

This research was performed under the EU’s project “Trusted, Citizen - LEA collaboration over sOcial Networks(TRILLION)” (grant agreement No 653256).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolaos Bastas .

Editor information

Editors and Affiliations

Appendices

Appendix

A Description of Method

In time-varying networks, nodes (i.e. users in a chat forum) interact with each other at given timestamps. These interactions are not homogeneously distributed; they are denser or sparser in certain time periods, depending on various factors (i.e. type of media, occurrence of an important event etc.). In the following, we proceed by aggregating these interactions using appropriate time windows with the exact values per dataset presented in Sect. 4. We denote by t the time step in the aggregated data.

After the aggregation, data are transformed into a set of directed unweighted graphs \(G_i\), \(i=1...M\), which are called layers. A part of them is shown in Fig. 8 (left). In this setting, there are two types of links: intra-links that connect two nodes in the same layer and inter-links that connect identical nodes in consecutive layers.

Next, we present the method for ranking important network nodes by evaluating their “influence score” S. This is a two level procedure, incorporating (a) the calculation of the “influence score” update for each node in consecutive time steps and (b) the application of a monotonically decreasing function to account for the influence decay as time passes.

We denote with \(\varDelta H (s,G_i)\) the “influence score” update which refers to a node s residing in layer \(G_i\). It is calculated as follows: starting from \(G_i\), we specify the number of subsequent layers d within the time range \([t_i, t_i + dt]\), with dt the (fixed) length of the time window and \(t_i\) the timestep corresponding to layer \(G_i\).Footnote 2 Then, we construct every possible path from node s in layer \(G_i\) to any node w in layer \(G_{i+d}\), which passes through nodes in the intermediate layers (Fig. 8(right)). Note that we take into account both the inter and intra-links during path construction.

Table 1. List of notations used in the text
Fig. 8.
figure 8

Path construction for the DepthRank method. Left panel: nodes populate the layers (denoted by the blue dashed ellipses) according to their time ordering. Red arrows indicate the interactions between nodes in the same layer and black arrows the links between the same nodes on subsequent layers. Right panel: we start from node 1 as a source node and draw connections between the source node and its nearest neighbors in layer \(G_i\) (corresponding to time step \(t_i\)). For each node (including the source node), we proceed in the same manner for the subsequent layers, until reaching the last one within the time window \([t_i,t_i+dt]\). Red arrows stand for the directed connections between nodes in the same layer and black arrows for the connections between the identical nodes in subsequent layers (“self-links”). The dashed box indicate the level at which we perform the calculations (see text).

Each node w in layer \(G_{i+d}\) has \(r^{out}_{(w,G_{i+d})}\) outgoing links and \(r^{in}_{(w,G_{i+d-1})}\) incoming links in layer \(G_{i+d-1}\). We introduce the following relation:

$$ H(w,G_{i+d}|s,G_i) = {\left\{ \begin{array}{ll} r^{in}_{(w,G_{i+d-1})} \cdot r^{out}_{(w,G_{i+d})} , \hbox {if} \, dist((w,G_{i+d}),(s,G_{i}))<2 \cdot d +1 \\ 0 ,\hbox {otherwise} \end{array}\right. } $$

which couples path diversity (\(r^{in}_{(w,G_{i+d-1})}\)) and transmission efficiency (\(r^{out}_{(w,G_{i+d})}\)), to denote the portion of the update \(\varDelta H\) attributed to node w. Thus, \(\varDelta H (s,G_{i})\) is given by:

$$\begin{aligned} \varDelta H(s,G_{i}) = \sum _{w \in G_{i+d}}{H(w,G_{i+d}|s,G_{i})} \end{aligned}$$

Referring to Fig. 8 (right), we have at layer \(i+3\) (i is the top layer): \(H(2,G_{i+3}|1,G_{i}) = 2 \cdot 1 = 2\), \(H(1,G_{i+3}|1,G_{i}) = 2 \cdot 2 = 4\), \(H(3,G_{i+3}|1,G_{i}) = 2 \cdot 1 = 2\), \(H(4,G_{i+3}|1,G_{i}) = 1 \cdot 1 = 1\), \(H(5,G_{i+3}|1,G_{i}) = 2 \cdot 1 = 2\) and \(H(6,G_{i+3}|1,G_{i}) = 1 \cdot 2 = 2\). Thus, for node \(s=1\) in layer \(G_i\), the update is \(\varDelta H(1,G_{i}) = 13\).

The “influence score” \(S(s,G_{i})\) of node s up to layer \(G_i\) is the sum of the update \(\varDelta H(s,G_{i})\) and its previous value \(S(s,G_{j})\) in layer \(G_j\) - which corresponds to the time step \(t_j^s\) of its most recent update - weighted using a forgetting function \(g(t_i-t_j^s)\) to account for aging effects:

$$\begin{aligned} S(s,G_i) = \varDelta H(s,G_{i}) + S(s,G_{j}) \cdot g(t_i-t_j^s) \end{aligned}$$

In the following, we consider three different variants: no forgetting mechanism (\(g(t_i-t_j^s)=1\)), forgetting mechanism (\(g(t_i-t_j^s)=e^{-(t_i-t_j^s)}\)) and forgetting mechanism with normalization (\(g(t_i-t_j^s)=e^{-(t_i-t_j^s)/T}\)), with T the maximum time step of the aggregated dataset.Footnote 3 Thus, the score \(S(s,G_i)\), can be formulated as follows:

  • No forgetting mechanism

    $$\begin{aligned} S(s,G_i) = \varDelta H(s,G_i) + S(s,G_j) \end{aligned}$$
    (4)
  • Forgetting mechanism:

    $$\begin{aligned} S(s,G_i) = \varDelta H(s,G_i) + S(s,G_j) \cdot e^{-(t_i-t_j^s)} \end{aligned}$$
    (5)
  • Forgetting mechanism with normalization:

    $$\begin{aligned} S(s,G_i) = \varDelta H(s,G_i) + S(s,G_j) \cdot e^{-(t_i-t_j^s)/T} \end{aligned}$$
    (6)

The procedure ends when we reach the last layer for which \(t \le T-dt\). The overall “influence score” S for node s in the underlying dataset is given by (remember that \(G_j\) denotes the layer of the most recent update):

  • No forgetting mechanism:

    $$\begin{aligned} S(s) = S(s,G_j) \end{aligned}$$
    (7)
  • Forgetting mechanism:

    $$\begin{aligned} S(s) = S(s,G_j) \cdot e^{-(T-t^s_j)} \end{aligned}$$
    (8)
  • Forgetting mechanism with normalization:

    $$\begin{aligned} S(s) = S(s,G_j) \cdot e^{-(T-t^s_j)/T} \end{aligned}$$
    (9)

We finally rank the nodes in descending order according to their S value. In Algorithm 1, we provide the pseudo-code for our method, including the three variants defined previously: DepthRank (DR), where we do not impose any forgetting mechanism (Eqs. (4) and (7)), DepthRank with forgetting mechanism (DR-F) (Eqs. (5) and (8)) and DepthRank where we impose normalization (DR-NF) (Eqs. (6) and (9)).

figure a

B Baseline Methods and Evaluation Metrics

In this section, we present the baseline methods along with the evaluation metrics. For the former, we have chosen k-shell (kS) and Collective Influence (CI) methods because they are widely employed in the context of graph-based techniques.

k -shell is broadly used for ranking purposes. Given a graph G, it proceeds as follows: every isolated node is considered as being in the 0-shell. For the rest of the nodes with connectivity \(k \ge 1\), one first removes the links from every node with \(k = 1\). The residual graph may consist of nodes with connectivity \(k=1\) ; thus, the same procedure is repeated until no nodes with \(k \le 1\) remain. We say that these nodes constitute the 1-shell. The rest of the shells are uncovered in the same way. The nodes belonging to the highest shells are the most important [9].

Collective Influence is based on optimal percolation. According to [14], it takes the form:

(10)

where, \(r_i\) is the degree of the node i, the front of a sphere centered at the i-th node with radius l (in terms of shortest path distance). We follow the suggestions of the authors in [14] and set \(l=2\) and 3 in the calculations.

To assess the spreading ability of the important network nodes uncovered by each method, we apply a temporal implementation of the SIR model, called Temporal Gillespie Algorithm [23] (we choose the Poisson homogeneous version), to account for the time-varying interactions in real-world networks. We have slightly modified the process in order to start from a set of initially infected nodes rather than a single one. These sets coincide with the ordered sets identified by the methods used. The efficiency of spreading is evaluated by using the maximum number of infected nodes within the whole dataset, \(I_{max} = \max \limits _{1 \le t \le T}{I(t)}\). \(\beta \) and \(\gamma \) are the infection and recovery rates, respectively. Except for the case of the small CollegeMsg dataset, where we have calculated the epidemic threshold for a fixed \(\gamma \) value as in [22], for the rest of the datasets this was done heuristically, by searching for those values of \(\beta \) and \(\gamma \) for which considerable amount of spreading is observed. The exact values are displayed in the figure captions in Sect. 4.

We are also interested in the correct ordering of the ranking methods. A suitable measure is the \(\tau \)-Kendall [8], which evaluates the concordance of two ordered lists. We exploit it in the following way:

  • We collect all the nodes tagged as important from every method o. We calculate \(I_{max}\) for each one and rank them according to this value. This will be the overall “ground-truth” ranking list, L.

  • We select the m first nodes according to each method ranking, \(R_o\), and find their position in L. This will be the method specific “ground-truth” ranking list, \(L_{o}\).

  • We calculate the \(\tau \)-Kendall between \(L_o\) and \(R_o\).

This evaluation is applied for various lengths m of the most important nodes lists (see respective figures in Sect. 4).

C Datasets

We have performed the experiments using the following publicly available real world datasets:

  • CollegeMsg Footnote 4 : It is a small temporal network corresponding to a private messaging facility at the University of California, Irvine. It consists of 1899 users and 59835 temporal edges. The timestamps are in UNIX time (seconds).

  • Facebook Footnote 5 : It consists of 855542 facebook wall posts between 45813 unique users. The timestamps are in UNIX time (seconds).

  • Data Mining (DM) temporal citation network: We have used the DBLP dataset of [21] which consists of all the papers in computer science up to 2010. The timestamps are in years.

The preparation of the DM dataset was performed as follows: we have selected the papers that contain one or more authors from a list of DM expertsFootnote 6. Each article record contains a set of indexes to other articles which cite the current one. We use this information to create the connections between the authors citing a paper and those of the cited paper per year of citation. For example, assume that we pick a record of the form: (title1, author1, author2, author3, 2009) which cites among other papers the following: (title2, cauthor1, cauthor2, 1993). We formulate the authors temporal interaction patterns as: (cauthor1, author1, 2009), (cauthor1, author2, 2009), (cauthor1, author3, 2009), (cauthor2, author1, 2009), (cauthor2, author2, 2009), (cauthor2, author3, 2009). Each tuple means that the first author influenced the second author in year 2009. In this way, we construct the temporal citation network for the years 1993–2010.

Each of the previous datasets comes into the form (uvt), where u is the sender of a message or the author being cited and v the receiver of the message or the author citing an article at time t. In order to apply k-shell and CI methods, we aggregate the interactions in a static unweighted graph.

D Detailed Results for \(\tau \)-Kendall

In this section, we provide more detailed plots for the \(\tau \)-Kendall, by incorporating all the methods and parameters used (Figs. 9, 10 and 11).

Fig. 9.
figure 9

Plot of the \(\tau \)-Kendall as a function of the number m of the most important (top) nodes for the CollegeMsg dataset, for \(\gamma =0.01\) and (a) \(\beta =0.02\) and (b) \(\beta =0.05\). The methods are shown in the legends.

Fig. 10.
figure 10

Plot of the \(\tau \)-Kendall as a function of the number m of the most important (top) nodes for Facebook dataset, for \(\beta =0.15\) and (a) \(\gamma =0.01\) and (b) \(\gamma =0.001\). The methods are shown in the legends.

Fig. 11.
figure 11

Plot of the \(\tau \)-Kendall as a function of the number m of the most important (top) nodes for DM temporal citation network, for \(\gamma =0.05\) and (a) \(\beta =0.004\) and (b) \(\beta =0.04\). The methods are shown in the legends.

E Comparison with Ground-Truth in DM Temporal Citation Network

In Table 2, we present the number of important network nodes identified by each method that are common with those in the ground-truth list for the case of DM temporal citation network (see Sect. 4), as m increases.

Table 2. Number of important authors identified by the methods used compared to the top-10 DM authors list (see main text) for the DM temporal citation network.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Bastas, N., Semertzidis, T., Daras, P. (2017). DepthRank: Exploiting Temporality to Uncover Important Network Nodes. In: Ciampaglia, G., Mashhadi, A., Yasseri, T. (eds) Social Informatics. SocInfo 2017. Lecture Notes in Computer Science(), vol 10540. Springer, Cham. https://doi.org/10.1007/978-3-319-67256-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67256-4_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67255-7

  • Online ISBN: 978-3-319-67256-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics