Abstract
We explain recent studies on pattern extraction from large-scale graphs. Patterns are represented explicitly and implicitly. Explicit patterns are concrete subgraphs defined in graph theory, e.g., clique and tree. For an explicit model of patterns, we introduce notable fast algorithms for finding all frequent patterns. We also confirm that these problems are closely related to traditional problems in data mining. On the other hand, implicit patterns are defined by statistical factors, e.g., modularity, betweenness, and flow determining optimal hidden subgraphs. For both models, we give an introductory survey focusing on notable pattern extraction algorithms.
Partially supported by KAKENHI(23680016, 20589824) and JST PRESTO program.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
There are other measures to define betweenness instead of shortest path. The shortest path is, however, easier to compute than other measures.
- 2.
Assume any edge has unit weight in web graph.
- 3.
For other classes of trees, see e.g., [33].
References
Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized substructure discovery for semi-structured data. In: PKDD, pp. 1–14 (2002)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE, pp. 3–14 (1995)
Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR, pp. 37–45 (1998)
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa., S.: Efficient substructure discovery from large semi-structured data. In: SDM, pp. 158–174 (2002)
Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa., S.: Online algorithms for mining semi-structured data stream. In: ICDM, pp. 27–34 (2002)
Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering frequent substructures in large unordered trees. In: Discovery Science, pp. 47–61 (2003)
Backstrom, L., Huttenlocher, D.P., Kleinberg, J.M., Lan, X.: Group formation in large social networks: membership, growth, and evolution. In: KDD, pp. 44–54 (2006)
Batagelj, V., Zaversnik, M.: An O(m) algorithm for cores decomposition of networks. arXiv, preprint cs/0310049 (2003)
Berger-Wolf, T.Y., Saia, J.: A framework for analysis of dynamic social networks. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 523–528. ACM (2006)
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 10008 (2008)
Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On modularity clustering. IEEE Trans. Knowl. Data Eng. 20, 172–188 (2008)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Networks ISDN Syst. 30(1), 107–117 (1998)
Ceglar, A., Roddick, J.F.: Association mining. ACM Comput. Surv. 38(2), 5 (2006)
Chen, G., Wu, X., Zhu, X.: Sequential pattern mining in multiple streams. In: ICDM, pp. 27–30 (2005)
Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: maintaining closed frequent itemsets over a stream sliding window. In: ICDM, pp. 59–66 (2004)
Chiba, N. Nishizeki, T.: Arboricity and subgraph listing algorithms. SIAM J. Comput. 14(1) 210–223 (1985)
Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 66111–66117 (2004)
Cohen, J.D.: Trusses: cohesive subgraphs for social network analysis. National Security Agency Technical Report (2008)
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT press and McGraw-Hill Book Co., Cambridge (1992)
Dehaspe, L., Toivonen, H., King, R.D.: Finding frequent substructures in chemical compounds. In: KDD, pp. 30–36 (1998)
Diestel, R.: Graph Theory. Springer, Heidelberg (2000)
Ezeife, C.I., Monwar, M.: SSM: a frequent sequential data stream patterns miner. In: CIDM, pp. 120–126 (2007)
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: KDD, pp. 150–160 (2000)
Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.: Self-organization and idenfitication of web communities. IEEE Comput. 35(3), 66–71 (2002)
Freeman, L.C.: A set of measures of cenrrality based upon betweenness. Sociometry 40, 35–41 (1977)
Fu, W., Song, L., Xing, E.P.: Dynamic mixed membership blockmodel for evolving networks. In: Proceedings of the 26th annual international conference on machine learning, pp. 329–336. ACM (2009)
Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H.: Parameter free bursty events detection in text streams. In: VLDB, pp. 181–192 (2005)
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. PNAS 99(12), 7821–7826 (2002)
Goldberg, A.V., Tarjan, R.E.: A new approach to the maximal flow problem. In: STOC, pp. 136–146 (1986)
Greene, D., Doyle, D., Cunningham, P.: Tracking the evolution of communities in dynamic social networks. In: 2010 international conference on advances in social networks analysis and mining (ASONAM), pp. 176–183. IEEE (2010)
Hido, S., Kawano, H.: AMIOT: Induced ordered tree mining in tree-structured databases. In: ICDM, pp. 170–177 (2005)
Ishiguro, K., Iwata, T., Ueda, N., Tenenbaum, J.: Dynamic infinite relational model for time-varying relational data analysis. Adv. Neural Inf. Process. Syst. 23, 919-927 (2010)
Jiménez, A., Berzal, F., Cubero, J.-C.: Frequent tree pattern mining: a survey. Intell. Data Anal. 14(6), 603–622 (2010)
Keogh, E., Lonardi, S., Chiu, B.Y.-C.: Finding surprising patterns in a time series database in linear time and space. In: KDD, pp. 550–556 (2002)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Extracting large-scale knowledge bases from the web. In: VLDB, pp. 639–650 (1999)
Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1–3), 458–473 (2008)
Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: KDD, pp. 497–506 (2009)
Leskovec, J., Horvitz, E.: Planetary-scale views on a large instant-messaging network. In: WWW, pp. 915–924 (2008)
Li, H.-F., Lee, S.Y.: Miningfrequentitemsetsoverdatastreams using efficient window sliding techniques. Expert Syst. Appl. 36, 1466–1477 (2009)
Li, H.-F., Lee, S.Y., Shan, M.-K.: Online mining (recently) maximal frequent itemsets over data streams. In: RIDE-SDMA, pp. 11–18 (2005)
Lin, Y.R., Chi, Y., Zhu, S., Sundaram, H., Tseng, B.L.: Facetnet: a framework for analyzing communities and their evolutions in dynamic networks. In: Proceedings of the 17th international conference on World Wide Web, pp. 685–694. ACM (2008)
Makino, K., Uno, T.: New algorithms for enumerating all maximal cliques. In: SWAT, pp. 260–272 (2004)
Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: VLDB, pp. 346–357 (2002)
Mokken, R.J.: Cliques, clubs and clans. Qual. Quant. 13(2), 161–173 (1979)
Moon, J.W., Moser, L.: On cliques in graphs. Isr. J. Math. 3, 23–28 (1965)
Morishita, S.: On classification and regression. In: Discovery Science, pp. 40–57 (1998)
Nakamura, Y., Horiike, T., Kuboyama, T., Sakamoto, H.: Extracting research communities from bibliographic data. KES J. 16(1), 25–34 (2012)
Nakano, S., Uno, T.: Efficient generation of rooted trees. Technical report, NII Technical Report NII-2003-005E (2003)
Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 066133 (2004)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)
Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: 1st international workshop on mining graphs, trees, and sequences (MGTS), pp. 55–64 (2003)
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: KDD, pp. 647–652 (2004)
Oates, T., Cohen, P.R.: Searching for structure in multiple streams of data. In: ICML, pp. 346–354 (1996)
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE, pp. 215–224 (2001)
Pei, J., Han, J., Wang, W.: Constraint-based sequential pattern mining: the pattern-growth methods. J. Intell. Inf. Syst. 28(2), 133–160 (2007)
Qiu, J., Lin, Z.: A framework for exploring organizational structure in dynamic social networks. Decis. Support Syst. 51(4), 760–771 (2011)
Raissi, C., Roncelet, P., Teisseire, M.: SPEED: mining maxirnal sequential patterns over data strearns. In: International IEEE conference on intelligent systems, pp. 546–552 (2006)
Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: WEA, pp. 606–609 (2005)
Seidman, S.B.: Network structure and minimum degree. Social Networks 5(3), 269–287 (1983)
Seidman, S.B., Foster, B.L.: A graph-theoretic generalization of the clique concept. J. Math. Soc. 6(1), 139–154 (1978)
Snowsill, T., Nicart, F., Stefani, M., De Bie, T., Cristianini, N.: Finding surprising patterns in textual data streams. In: International workshop on cognitive information processing, pp. 405–410 (2010)
Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: EDBT, pp. 3–17 (1996)
Tantipathananandh, C., Berger-Wolf, T., Kempe, D.: A framework for community identification in dynamic social networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 717–726. ACM (2007)
Tatikonda, S., Parthasarathy, S., Kur, T.M.: Trips and tides: new algorithms for tree mining. In: CIKM, pp. 455–464 (2006)
Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1), 28–42 (2006)
Tsukiyama, S., Ide, M., Ariyoshi, H., Shirakawa, I.: A new algorithm for generating all the maximal independent sets. SIAM J. Comput., 6, 505–517 (1977)
Uno, T., Asai, T., Uchida, Y., Arimura, H.: LCM: an efficient algorithm for enumerating frequent closed item sets. In: FIMI (2003)
Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)
Wang, N., Zhang, J., Tan, K.-L., Tung., A.K.H.: On triangulation-based dense neighborhood graph discovery. In VLDB, pp. 58–68 (2010)
Wang, N., Zhang, J., Tan, K.L., Tung, A.K.H.: On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endowment 4(2), 58–68 (2010)
Wasserman, S., Faust, K.: Social network analysis: methods and applications. Cambridge University Press, Cambridge (1994)
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: KDD, pp. 71–80 (2002)
Zaki, M.J.: Efficiently mining frequent embedded unordered trees. Fundam. Inform. 66(1–2), 33–52 (2005)
Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)
Zaki, M.J., Ogihara, M.: Theoretical foundation of association rules. In: Workshop on data-mining and knowledge discovery (1998)
Source List
Tutorial
Social and Information Network Analysis Course http://www.stanford.edu/class/cs224w/index.html
Material
Stanford Large Network Dataset Collection http://snap.stanford.edu/data
Citation networks http://dblp.uni-trier.de/xml
Internet topology http://topology.eecs.umich.edu/data.html
Newman’s pointers http://www-personal.umich.edu/~mejn/netdata
Mining Program Source
LCM for sequential mining http://research.nii.ac.jp/~uno/code/lcm_seq.html
FREQT http://research.nii.ac.jp/~uno/code/FREQT_distMay02_j50.tar.gz
Max clique by Makino and Uno http://research.nii.ac.jp/~uno/code/mace.html
Max clique by Tomita et al. http://research.nii.ac.jp/~uno/code/macego10.zip
Social Network Analysis Tools
Gephi http://gephi.org
Network Workbench http://nwb.cns.iu.edu
Pajek http://pajek.imfm.si
Others (list of tools) http://en.wikipedia.org/wiki/Social_network_analysis_software
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Sakamoto, H., Kuboyama, T. (2013). Pattern Extraction from Graphs and Beyond. In: Tsihrintzis, G., Virvou, M., Jain, L. (eds) Multimedia Services in Intelligent Environments. Smart Innovation, Systems and Technologies, vol 24. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00372-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-00372-6_7
Published:
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00371-9
Online ISBN: 978-3-319-00372-6
eBook Packages: EngineeringEngineering (R0)