Abstract
We discuss the problem of extending data mining approaches to cases in which data points arise in the form of individual graphs. Being able to find the intrinsic low-dimensionality in ensembles of graphs can be useful in a variety of modeling contexts, especially when coarse-graining the detailed graph information is of interest. One of the main challenges in mining graph data is the definition of a suitable pairwise similarity metric in the space of graphs. We explore two practical solutions to solving this problem: one based on finding subgraph densities, and one using spectral information. The approach is illustrated on three test data sets (ensembles of graphs); two of these are obtained from standard literature graph generating algorithms, while the graphs in the third example are sampled as dynamic snapshots from an evolving network simulation. We further combine these approaches with equation free techniques, demonstrating how such data mining can enhance scientific computation of network evolution dynamics.
Keywords
Mathematics Subject Classification:
To Bernold Fiedler, with admiration for his choice of research problems in mathematics and modeling, and for what he has taught us about them.
Notes
- 1.
Note that an alternative equivalent way to define the similarity measure would be to directly compare the contribution of the different eigenvectors to \(S_i\) instead of summing the contributions and then using different values of \(\lambda \). However, it is difficult to generalize this approach to cases where there are graphs of varying sizes.
References
Barabási, A.L.: Linked: The New Science of Networks. Perseus Books Group (2002)
Bayati, M., Gleich, D.F., Saberi, A., Wang, Y.: Message Passing Algorithms for Sparse Network Alignment. ArXiv e-prints (2009)
Bold, K.A., Rajendran, K., Ráth, B., Kevrekidis, I.G.: An equation-free approach to coarse-graining the dynamics of networks. J. Comput. Dyn. 1(1) (2014)
Bunke, H.: A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters 19(3–4), 255–259 (1998). http://dx.doi.org/10.1016/s0167-8655(97)00179-7
Chung, F., Lu, L.: Connected components in random graphs with given expected degree sequences. Ann. Comb. 6, 125–145 (2002)
Dsilva, C.J., Talmon, R., Coifman, R.R., Kevrekidis, I.G.: Parsimonious representation of nonlinear dynamical systems through manifold learning: a chemotaxis case study. Appl. Comput. Harmonic Anal. (2015)
Durrett, R., Gleeson, J.P., Lloyd, A.L., Mucha, P.J., Shi, F., Sivakoff, D., Socolar, J.E.S., Varghese, C.: Graph fission in an evolving voter model. PNAS 109, 3682–3687 (2012)
Erdös, P., Rényi, A.: On random graphs, i. Publicationes Mathematicae (Debrecen) 6, 290–297 (1959)
Eubank, S.H., Guclu, V.S.A., Kumar, M., Marathe, M., Srinivasan, A., Toroczkai, Z., Wang, N.: Modelling disease outbreaks in realistic urban social networks. Nature 429, 180–184 (2004)
Ferguson, N.M., Cummings, D.A.T., Cauchemez, S., Fraser, C., Riley, S., Meeyai, A., Iamsirithaworn, S., Burke, D.S.: Strategies for containing an emerging influenza pandemic in southeast asia. Nature 437, 209–214 (2005)
Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nystrom method. IEEE Trans. Pattern Anal. Mac. Intell. 26(2), 214–225 (2004)
Gear, C.W., Kevrekidis, I.G.: Projective methods for stiff differential equations: problems with gaps in their eigenvalue spectrum. SIAM J. Sci. Comput. 24(4), 1091–1106 (2003)
Ghosh, R., Lerman, K., Surachawala, T., Voevodski, K., Teng, S.H.: Non-Conservative Diffusion and its Application to Social Network Analysis. ArXiv e-prints (2011)
Gounaris, C., Rajendran, K., Kevrekidis, I., Floudas, C.: Generation of networks with prescribed degree-dependent clustering. Opt. Lett. 5, 435–451 (2011)
Gärtner, T., Flach, P., Wrobel, S.: On graph kernels: hardness results and efficient alternatives. In: Conference on Learning Theory, pp. 129–143 (2003)
Holiday, A., Kevrekidis, I.G.: Equation-free analysis of a dynamically evolving multigraph. Eur. Phys. J. Spec. Top. 225(6–7), 1281–1292 (2016)
Iori, G.: A microsimulation of traders activity in the stock market: the role of heterogeneity, agents interactions and trade frictions. J. Econ. Behav. Organ. 49, 269285 (2002)
Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled graphs. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 321–328. AAAI Press (2003)
Kevrekidis, I.G., Gear, C.W., Hummer, G.: Equation-free: the computer-aided analysis of complex multiscale systems. AIChE J. 50(7), 1346–1355 (2004)
Kevrekidis, I.G., Gear, C.W., Hyman, J.M., Kevrekidis, P.G., Runborg, O., Theodoropoulos, C., et al.: Equation-free, coarse-grained multiscale computation: enabling mocroscopic simulators to perform system-level analysis. Commun. Math. Sci. 1(4), 715–762 (2003)
Koutra, D., Parikh, A., Ramdas, A., Xiang, J.: Algorithms for graph similarity and subgraph matching. http://www.cs.cmu.edu/jingx/docs/DBreport.pdf (2011)
Levine, H., Rappel, W.J., Cohen, I.: Self-organization in systems of selfpropelled particles. Phys. Rev. E 63, 017,101 1–4 (2001)
Liu, Y., Passino, K.: Stable social foraging swarms in a noisy environment. IEEE Trans. Autom. Contr. 49, 30–44 (2004)
Longini, I.M., Fine, P.E., Thacker, S.B.: Predicting the global spread of new infectious agents. Am. J. Epidemiol. 123, 383–391 (1986)
Lovász, L., Szegedy, B.: Limits of dense graph sequences. J. Comb. Theory Ser. B 96(6), 933–957 (2006). https://doi.org/10.1016/j.jctb.2006.05.002
Mahe, P., Ueda, N., Akutsu, T., Perret, J.L., Vert, J.P.: Extensions of marginalized graph kernels. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 552–559. ACM Press (2004)
Marschler, C., Sieber, J., Berkemer, R., Kawamoto, A., Starke, J.: Implicit methods for equation-free analysis: convergence results and analysis of emergent waves in microscopic traffic models. SIAM J. Appl. Dyn. Syst. 13(3), 1202–1238. SIAM (2014)
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: 18th International Conference on Data Engineering (ICDE 2002). http://ilpubs.stanford.edu:8090/730/ (2002)
Nadler, B., Lafon, S., Coifman, R.R., Kevrekidis, I.G.: Diffusion maps, spectral clustering and eigenfunctions of fokker-planck operators. In: Advances in Neural Information Processing Systems 18, pp. 955–962. MIT Press (2005)
Nadler, B., Lafon, S., Coifman, R.R., Kevrekidis, I.G.: Diffusion maps, spectral clustering and reaction coordinates of dynamical systems. Appl. Comput. Harmonic Anal. 21(1), 113–127 (2006). 10.1016/j.acha.2005.07.004
Newman, M.E.J.: The structure and function of complex networks. SIAM Rev. 45(2), 167–256 (2003)
Papadimitriou, P., Dasdan, A., Garcia-Molina, H.: Web graph similarity for anomaly detection. Technical Report 2008-1, Stanford InfoLab (2008). http://ilpubs.stanford.edu:8090/836/
Pelillo, M.: Replicator equations, maximal cliques, and graph isomorphism. Neural Comput. 11, 1933–1955 (1998)
Rajendran, K., Kevrekidis, I.G.: Analysis of data in the form of graphs. arXiv preprint arXiv:1306.3524 (2013)
Raymond, J.W., Gardiner, E.J., Willett, P.: Rascal: Calculation of graph similarity using maximum common edge subgraphs. Comput. J. 45, 631–644 (2002)
Shlens, J.: A tutorial on principal component analysis: derivation, discussion and singular value decomposition. http://www.cs.princeton.edu/picasso/mats/PCA-Tutorial-Intuition_jp.pdf (2003)
Tenenbaum, J.B., Silva, V.d., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000). 10.1126/science.290.5500.2319
Vishwanathan, S.V.N., Borgwardt, K.M., Risi Kondor, I., Schraudolph, N.N.: Graph kernels. J. Mach. Learn. Resear. 11, 1201–1242 (2010)
Wang, S., Zhang, C.: Microscopic model of financial markets based on belief propagation. Phys. A 354, 496504 (2005)
Wernicke, S., Rasche, F.: Fanmod: a tool for fast network motif detection. Bioinformatics 22(9), 1152–1153 (2006). 10.1093/bioinformatics/btl038. http://bioinformatics.oxfordjournals.org/content/22/9/1152.abstract
Zager, L.A., Verghese, G.C.: Graph similarity scoring and matching. Appl. Math. Lett. 21(1), 86–94 (2008). 10.1016/j.aml.2007.01.006. http://www.sciencedirect.com/science/article/pii/S0893965907001012
Zelinka, B.: On a certain distance between isomorphism classes of graphs. Asopis Pro Pstovn Matematiky 100(4), 371–373. http://eudml.org/doc/21256 (1975)
Acknowledgements
The work of IGK was partially supported by the US National Science Foundation, as well as by AFOSR (Dr. Darema) and DARPA contract HR0011-16-C-0016.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Rajendran, K., Kattis, A., Holiday, A., Kondor, R., Kevrekidis, I.G. (2017). Data Mining When Each Data Point is a Network. In: Gurevich, P., Hell, J., Sandstede, B., Scheel, A. (eds) Patterns of Dynamics. PaDy 2016. Springer Proceedings in Mathematics & Statistics, vol 205. Springer, Cham. https://doi.org/10.1007/978-3-319-64173-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-64173-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64172-0
Online ISBN: 978-3-319-64173-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)