Discovering patterns in time-varying graphs: a triclustering approach

Guigourès, Romain; Boullé, Marc; Rossi, Fabrice

doi:10.1007/s11634-015-0218-6

Discovering patterns in time-varying graphs: a triclustering approach

Regular Article
Published: 13 October 2015

Volume 12, pages 509–536, (2018)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Romain Guigourès¹,
Marc Boullé¹ &
Fabrice Rossi²

531 Accesses
13 Citations
5 Altmetric
Explore all metrics

Abstract

This paper introduces a novel technique to track structures in time varying graphs. The method uses a maximum a posteriori approach for adjusting a three-dimensional co-clustering of the source vertices, the destination vertices and the time, to the data under study, in a way that does not require any hyper-parameter tuning. The three dimensions are simultaneously segmented in order to build clusters of source vertices, destination vertices and time segments where the edge distributions across clusters of vertices follow the same evolution over the time segments. The main novelty of this approach lies in that the time segments are directly inferred from the evolution of the edge distribution between the vertices, thus not requiring the user to make any a priori quantization. Experiments conducted on artificial data illustrate the good behavior of the technique, and a study of a real-life data set shows the potential of the proposed approach for exploratory data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

A survey of methods for time series change point detection

Article 08 September 2016

Samaneh Aminikhanghahi & Diane J. Cook

A review and evaluation of elastic distance functions for time series clustering

Article Open access 07 September 2023

Christopher Holder, Matthew Middlehurst & Anthony Bagnall

Notes

To avoid confusion, we denote $\nu $ the number of edges as a parameter of the model and $m$ the number of edges in a given data set.
Transport for London, http://www.tfl.gov.uk.
On a standard desktop PC, this takes approximately 50 min, with a maximal memory occupation of 4.5 GB.

References

Bekkerman R, El-Yaniv R, McCallum A (2005) Multi-way distributional clustering via pairwise interractions. In: ICML, pp 41–48
Borgatti SP (1988) A comment on Doreian’s regular equivalence in symmetric structures. Soc Netw 10:265–271
Article MathSciNet Google Scholar
Boullé M (2011) Data grid models for preparation and modeling in supervised learning. In: Guyon I, Cawley G, Dror G, Saffari A (eds) Hands-on pattern recognition: challenges in machine learning, vol 1. Microtome Publishing, pp 99–130
Casteigts A, Flocchini P, Quattrociocchi W, Santoro N (2012) Time-varying graphs and dynamic networks. Int J Parallel Emerg Distrib Syst 27(5):387–408. doi:10.1080/17445760.2012.668546
Article Google Scholar
Cover TM, Thomas JA (2006) Elements of information theory, 2nd edn. Wiley, New York
MATH Google Scholar
Dhillon IS, Mallela S, Modha D (2003) Information-theoretic co-clustering. In: KDD ’03, pp 89–98
Erdős P, Rényi A (1959) On random graphs. I. Publ Math 6:290–297
MathSciNet MATH Google Scholar
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3):75–174
Article MathSciNet Google Scholar
Goldenberg A, Zheng AX, Fienberg S, Airoldi EM (2009) A survey of statistical network models. Found Trends Mach Learn 2(2):129–233
Article Google Scholar
Grünwald P (2007) The minimum description length principle. Mit Press, Cambridge
Google Scholar
Guigourès R, Boullé M, Rossi F (2012) A triclustering approach for time evolving graphs. In: Co-clustering and applications, IEEE 12th international conference on data mining workshops (ICDMW 2012), Brussels, Belgium, pp 115–122. doi:10.1109/ICDMW.2012.61
Hansen P, Mladenovic N (2001) Variable neighborhood search: principles and applications. Eur J Oper Res 130(3):449–467
Article MathSciNet Google Scholar
Hartigan J (1972) Direct clustering of a data matrix. J Am Stat Assoc 67(337):123–129
Article Google Scholar
Hintze JL, Nelson RD (1998) Violin plots: a box plot-density trace synergism. Am Stat 52(2):181–184. doi:10.1080/00031305.1998.10480559
Article Google Scholar
Hopcroft J, Khan O, Kulis B, Selman B (2004) Tracking evolving communities in large linked networks. PNAS 101:5249–5253
Article Google Scholar
Kemp C, Tenenbaum J (2006) Learning systems of concepts with an infinite relational model. In: AAAI’06
Lang KJ (2009) Information theoretic comparison of stochastic graph models: some experiments. In: WAW, pp 1–12
Li Y, Jain A (1998) Classification of text documents. Comput J 41(8):537–546
Article Google Scholar
Lin J (1991) Divergence measures based on the Shannon entropy. IEEE Trans Inf Theory 37:145–151
Article MathSciNet Google Scholar
Murphy KP (2012) Machine learning: a probabilistic perspective. MIT Press, Cambridge
MATH Google Scholar
Nadel SF (1957) The theory of social structure. Cohen & West, London
Google Scholar
Nadif M, Govaert G (2010) Model-based co-clustering for continuous data. In: ICMLA, pp 175–180
Nowicki K, Snijders T (2001) Estimation and prediction for stochastic blockstructures. J Am Stat Assoc 96:1077–1087
Article MathSciNet Google Scholar
Palla G, Derenyi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818
Article Google Scholar
Palla G, Barabási AL, Vicsek T (2007) Quantifying social group evolution. Nature 446:664–667
Article Google Scholar
Rege M, Dong M, Fotouhi F (2006) Co-clustering documents and words using bipartite isoperimetric graph partitioning. In: ICDM, pp 532–541
Schaeffer S (2007) Graph clustering. Comput Sci Rev 1(1):27–64
Article Google Scholar
Schepers J, Van Mechelen I, Ceulemans E (2006) Three-mode partitioning. Comput Stat Data Anal 51(3):1623–1642
Article MathSciNet Google Scholar
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423
Article MathSciNet Google Scholar
Slonim N, Tishby N (1999) Agglomerative information bottleneck. Adv Neural Inf Process Syst 12:617–623
Google Scholar
Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partition. JMLR 3:583–617
MathSciNet MATH Google Scholar
Sun J, Faloutsos C, Papadimitriou S, Yu P (2007) Graphscope: parameter-free mining of large time-evolving graphs. In: KDD ’07, pp 687–696
Van Mechelen I, Bock HH, De Boeck P (2004) Two-mode clustering methods: a structured overview. Stat Methods Med Res 13(5):363–394
Article MathSciNet Google Scholar
White DR, Reitz KP (1983) Graph and semigroup homomorphisms on networks of relations. Soc Netw 5(2):193–324
Article MathSciNet Google Scholar
White H, Boorman S, Breiger R (1976) Social structure from multiple networks: I. Blockmodels of roles and positions. Am J Sociol 81(4):730–780
Article Google Scholar
Xing EP, Fu W, Song L (2010) A state-space mixed membership blockmodel for dynamic network tomography. Ann Appl Stat 4(2):535–566
Article MathSciNet Google Scholar
Zhao L, Zaki M (2005) Tricluster: an effective algorithm for mining coherent clusters in 3d microarray data. In: SIGMOD conference, pp 694–705

Download references

Acknowledgments

The authors thank the anonymous reviewers and the associate editor for their valuable comments that helped improving this paper.

Author information

Authors and Affiliations

Orange Labs, 2 avenue Pierre Marzin, 22300, Lannion, France
Romain Guigourès & Marc Boullé
SAMM EA 45 43, Université Paris 1, 90 rue Tolbiac, 75013, Paris, France
Fabrice Rossi

Authors

Romain Guigourès
View author publications
You can also search for this author in PubMed Google Scholar
Marc Boullé
View author publications
You can also search for this author in PubMed Google Scholar
Fabrice Rossi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabrice Rossi.

Appendix 1: Interpretations of the dissimilarity between two clusters

Interestingly, the dissimilarity given in Definition 3 receives several interpretations. It corresponds to a loss of coding length (when the MODL criterion is interpreted as a description length), a loss of posterior probability of the triclustering given the data (see Proposition 1), and asymptotically to a divergence between probability distributions associated to the clusters (see Proposition 2).

Proposition 1

The exponential of the dissimilarity between two clusters, $c_1$ and $c_2$, gives the inverse ratio between the probability of the simplified triclustering given the data set and the probability of the original triclustering given the data set:

$$\begin{aligned} P(\mathcal {M}|E)=e^{\Delta _{ MODL }(c_1,c_2)} P(\mathcal {M}_{\text{ merge } c_1 \text { and }c_2}|E). \end{aligned}$$

(31)

Asymptotically—i.e when the number of edges tends to infinity - the dissimilarity between two clusters is proportional to a generalized Jensen–Shannon divergence between two distributions that characterize the clusters in the triclustering structure. To simplify the discussion, we give only the definition and result for the case of source clusters, but this can be generalized to the two other cases.

Definition 5

Let $\mathcal {M}$ be a triclustering. For all $i\in \{1,\ldots ,k_S\}$ we denote

$$\begin{aligned} \mathbb {P}^S_i=\left( \frac{\mu _{ijl}}{\mu _{{i}..}}\right) _{1\le j\le k_D, 1\le l\le k_T}. \end{aligned}$$

(32)

The matrix $\mathbb {P}^S_i$ can be interpreted as a probability distribution over $\{1, \ldots , k_D\}\times \{1, \ldots , k_T\}$. It characterizes $c^S_i$ as a cluster of source vertices as seen from clusters of destination vertices and of time stamps.

We denote $\mathbb {P}^S$ the associated marginal probability distribution obtained by

$$\begin{aligned} \mathbb {P}^S=\left( \frac{\sum _{i=1}^{k_S}\mu _{ijl}}{\sum _{i=1}^{k_S}\mu _{{i}..}}\right) _{1\le j\le k_D, 1\le l\le k_T}. \end{aligned}$$

(33)

Obviously, we have

$$\begin{aligned} \mathbb {P}^S=\sum _{i=1}^{k_S}\pi _i\mathbb {P}^S_i, \end{aligned}$$

(34)

where

$$\begin{aligned} \pi _i=\frac{\mu _{{i}..}}{\sum _{k=1}^{k_S}\mu _{{k}..}}. \end{aligned}$$

(35)

Proposition 2

Let $\mathcal {M}$ be a triclustering and let $c^S_i$ and $c^S_k$ be two source clusters. Then

$$\begin{aligned} \dfrac{\Delta _{ MODL }(c^S_i,c^S_k)}{\nu }\underset{\nu \rightarrow +\infty }{\longrightarrow } (\pi _i+\pi _k) JS^{\alpha _i,\alpha _k} (\mathbb {P}^S_i,\mathbb {P}^S_k), \end{aligned}$$

(36)

with

$$\begin{aligned} JS^{\alpha _i,\alpha _k} (\mathbb {P}^S_i,\mathbb {P}^S_k)=\alpha _i KL(\mathbb {P}^S_i || \alpha _i \mathbb {P}^S_i + \alpha _k \mathbb {P}^S_k) + \alpha _k KL(\mathbb {P}^S_k || \alpha _i \mathbb {P}^S_i + \alpha _k \mathbb {P}^S_k), \end{aligned}$$

(37)

and where $\alpha _i$ and $\alpha _k$ are the normalized mixture coefficients such as $\alpha _i = \frac{\pi _i}{\pi _i+\pi _k}$ and $\alpha _k = \frac{\pi _k}{\pi _i+\pi _k}$.

Proof

JS is the generalized Jensen–Shannon Divergence (Lin 1991) and KL, the Kullback–Leibler Divergence. The full proof is left out for brevity and relies on the Stirling approximation: $\log n!= n \log (n) - n + O(\log n)$, when the difference between the criterion value after and before the merge is computed. $\square $

The Jensen–Shannon divergence has some interesting properties: it is a symmetric and non-negative divergence measure between two probability distributions. In addition, the Jensen–Shannon divergence of two identical distributions is equal to zero. While this divergence is not a metric, as it is not sub-additive, it has nevertheless the minimal properties needed to be used as a dissimilarity measure within an agglomerative process in the context of co-clustering (Slonim and Tishby 1999).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guigourès, R., Boullé, M. & Rossi, F. Discovering patterns in time-varying graphs: a triclustering approach. Adv Data Anal Classif 12, 509–536 (2018). https://doi.org/10.1007/s11634-015-0218-6

Download citation

Received: 14 February 2013
Revised: 12 May 2015
Accepted: 28 September 2015
Published: 13 October 2015
Issue Date: September 2018
DOI: https://doi.org/10.1007/s11634-015-0218-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discovering patterns in time-varying graphs: a triclustering approach

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

A survey of methods for time series change point detection

A review and evaluation of elastic distance functions for time series clustering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix 1: Interpretations of the dissimilarity between two clusters

Proposition 1

Definition 5

Proposition 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Discovering patterns in time-varying graphs: a triclustering approach

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

A survey of methods for time series change point detection

A review and evaluation of elastic distance functions for time series clustering

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix 1: Interpretations of the dissimilarity between two clusters

Appendix 1: Interpretations of the dissimilarity between two clusters

Proposition 1

Definition 5

Proposition 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation