Abstract
This paper introduces a novel technique to track structures in time varying graphs. The method uses a maximum a posteriori approach for adjusting a three-dimensional co-clustering of the source vertices, the destination vertices and the time, to the data under study, in a way that does not require any hyper-parameter tuning. The three dimensions are simultaneously segmented in order to build clusters of source vertices, destination vertices and time segments where the edge distributions across clusters of vertices follow the same evolution over the time segments. The main novelty of this approach lies in that the time segments are directly inferred from the evolution of the edge distribution between the vertices, thus not requiring the user to make any a priori quantization. Experiments conducted on artificial data illustrate the good behavior of the technique, and a study of a real-life data set shows the potential of the proposed approach for exploratory data analysis.
Similar content being viewed by others
Notes
To avoid confusion, we denote \(\nu \) the number of edges as a parameter of the model and \(m\) the number of edges in a given data set.
Transport for London, http://www.tfl.gov.uk.
On a standard desktop PC, this takes approximately 50 min, with a maximal memory occupation of 4.5 GB.
References
Bekkerman R, El-Yaniv R, McCallum A (2005) Multi-way distributional clustering via pairwise interractions. In: ICML, pp 41–48
Borgatti SP (1988) A comment on Doreian’s regular equivalence in symmetric structures. Soc Netw 10:265–271
Boullé M (2011) Data grid models for preparation and modeling in supervised learning. In: Guyon I, Cawley G, Dror G, Saffari A (eds) Hands-on pattern recognition: challenges in machine learning, vol 1. Microtome Publishing, pp 99–130
Casteigts A, Flocchini P, Quattrociocchi W, Santoro N (2012) Time-varying graphs and dynamic networks. Int J Parallel Emerg Distrib Syst 27(5):387–408. doi:10.1080/17445760.2012.668546
Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York
Dhillon IS, Mallela S, Modha D (2003) Information-theoretic co-clustering. In: KDD ’03, pp 89–98
Erdős P, Rényi A (1959) On random graphs. I. Publ Math 6:290–297
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3):75–174
Goldenberg A, Zheng AX, Fienberg S, Airoldi EM (2009) A survey of statistical network models. Found Trends Mach Learn 2(2):129–233
Grünwald P (2007) The minimum description length principle. Mit Press, Cambridge
Guigourès R, Boullé M, Rossi F (2012) A triclustering approach for time evolving graphs. In: Co-clustering and applications, IEEE 12th international conference on data mining workshops (ICDMW 2012), Brussels, Belgium, pp 115–122. doi:10.1109/ICDMW.2012.61
Hansen P, Mladenovic N (2001) Variable neighborhood search: principles and applications. Eur J Oper Res 130(3):449–467
Hartigan J (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129
Hintze JL, Nelson RD (1998) Violin plots: a box plot-density trace synergism. Am Stat 52(2):181–184. doi:10.1080/00031305.1998.10480559
Hopcroft J, Khan O, Kulis B, Selman B (2004) Tracking evolving communities in large linked networks. PNAS 101:5249–5253
Kemp C, Tenenbaum J (2006) Learning systems of concepts with an infinite relational model. In: AAAI’06
Lang KJ (2009) Information theoretic comparison of stochastic graph models: some experiments. In: WAW, pp 1–12
Li Y, Jain A (1998) Classification of text documents. Comput J 41(8):537–546
Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37:145–151
Murphy KP (2012) Machine learning: a probabilistic perspective. MIT Press, Cambridge
Nadel SF (1957) The theory of social structure. Cohen & West, London
Nadif M, Govaert G (2010) Model-based co-clustering for continuous data. In: ICMLA, pp 175–180
Nowicki K, Snijders T (2001) Estimation and prediction for stochastic blockstructures. J Am Stat Assoc 96:1077–1087
Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818
Palla G, Barabási AL, Vicsek T (2007) Quantifying social group evolution. Nature 446:664–667
Rege M, Dong M, Fotouhi F (2006) Co-clustering documents and words using bipartite isoperimetric graph partitioning. In: ICDM, pp 532–541
Schaeffer S (2007) Graph clustering. Comput Sci Rev 1(1):27–64
Schepers J, Van Mechelen I, Ceulemans E (2006) Three-mode partitioning. Comput Stat Data Anal 51(3):1623–1642
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423
Slonim N, Tishby N (1999) Agglomerative information bottleneck. Adv Neural Inf Process Syst 12:617–623
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partition. JMLR 3:583–617
Sun J, Faloutsos C, Papadimitriou S, Yu P (2007) Graphscope: parameter-free mining of large time-evolving graphs. In: KDD ’07, pp 687–696
Van Mechelen I, Bock HH, De Boeck P (2004) Two-mode clustering methods: a structured overview. Stat Methods Med Res 13(5):363–394
White DR, Reitz KP (1983) Graph and semigroup homomorphisms on networks of relations. Soc Netw 5(2):193–324
White H, Boorman S, Breiger R (1976) Social structure from multiple networks: I. Blockmodels of roles and positions. Am J Sociol 81(4):730–780
Xing EP, Fu W, Song L (2010) A state-space mixed membership blockmodel for dynamic network tomography. Ann Appl Stat 4(2):535–566
Zhao L, Zaki M (2005) Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data. In: SIGMOD conference, pp 694–705
Acknowledgments
The authors thank the anonymous reviewers and the associate editor for their valuable comments that helped improving this paper.
Author information
Authors and Affiliations
Corresponding author
Appendix 1: Interpretations of the dissimilarity between two clusters
Appendix 1: Interpretations of the dissimilarity between two clusters
Interestingly, the dissimilarity given in Definition 3 receives several interpretations. It corresponds to a loss of coding length (when the MODL criterion is interpreted as a description length), a loss of posterior probability of the triclustering given the data (see Proposition 1), and asymptotically to a divergence between probability distributions associated to the clusters (see Proposition 2).
Proposition 1
The exponential of the dissimilarity between two clusters, \(c_1\) and \(c_2\), gives the inverse ratio between the probability of the simplified triclustering given the data set and the probability of the original triclustering given the data set:
Asymptotically—i.e when the number of edges tends to infinity - the dissimilarity between two clusters is proportional to a generalized Jensen–Shannon divergence between two distributions that characterize the clusters in the triclustering structure. To simplify the discussion, we give only the definition and result for the case of source clusters, but this can be generalized to the two other cases.
Definition 5
Let \(\mathcal {M}\) be a triclustering. For all \(i\in \{1,\ldots ,k_S\}\) we denote
The matrix \(\mathbb {P}^S_i\) can be interpreted as a probability distribution over \(\{1, \ldots , k_D\}\times \{1, \ldots , k_T\}\). It characterizes \(c^S_i\) as a cluster of source vertices as seen from clusters of destination vertices and of time stamps.
We denote \(\mathbb {P}^S\) the associated marginal probability distribution obtained by
Obviously, we have
where
Proposition 2
Let \(\mathcal {M}\) be a triclustering and let \(c^S_i\) and \(c^S_k\) be two source clusters. Then
with
and where \(\alpha _i\) and \(\alpha _k\) are the normalized mixture coefficients such as \(\alpha _i = \frac{\pi _i}{\pi _i+\pi _k}\) and \(\alpha _k = \frac{\pi _k}{\pi _i+\pi _k}\).
Proof
JS is the generalized Jensen–Shannon Divergence (Lin 1991) and KL, the Kullback–Leibler Divergence. The full proof is left out for brevity and relies on the Stirling approximation: \(\log n!= n \log (n) - n + O(\log n)\), when the difference between the criterion value after and before the merge is computed. \(\square \)
The Jensen–Shannon divergence has some interesting properties: it is a symmetric and non-negative divergence measure between two probability distributions. In addition, the Jensen–Shannon divergence of two identical distributions is equal to zero. While this divergence is not a metric, as it is not sub-additive, it has nevertheless the minimal properties needed to be used as a dissimilarity measure within an agglomerative process in the context of co-clustering (Slonim and Tishby 1999).
Rights and permissions
About this article
Cite this article
Guigourès, R., Boullé, M. & Rossi, F. Discovering patterns in time-varying graphs: a triclustering approach. Adv Data Anal Classif 12, 509–536 (2018). https://doi.org/10.1007/s11634-015-0218-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-015-0218-6