Pattern Extraction from Graphs and Beyond

Sakamoto, Hiroshi; Kuboyama, Tetsuji

doi:10.1007/978-3-319-00372-6_7

Hiroshi Sakamoto^4,6 &
Tetsuji Kuboyama⁵

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 24))

928 Accesses

Abstract

We explain recent studies on pattern extraction from large-scale graphs. Patterns are represented explicitly and implicitly. Explicit patterns are concrete subgraphs defined in graph theory, e.g., clique and tree. For an explicit model of patterns, we introduce notable fast algorithms for finding all frequent patterns. We also confirm that these problems are closely related to traditional problems in data mining. On the other hand, implicit patterns are defined by statistical factors, e.g., modularity, betweenness, and flow determining optimal hidden subgraphs. For both models, we give an introductory survey focusing on notable pattern extraction algorithms.

Partially supported by KAKENHI(23680016, 20589824) and JST PRESTO program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
There are other measures to define betweenness instead of shortest path. The shortest path is, however, easier to compute than other measures.
2.
Assume any edge has unit weight in web graph.
3.
For other classes of trees, see e.g., [33].

References

Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized substructure discovery for semi-structured data. In: PKDD, pp. 1–14 (2002)
Google Scholar
Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE, pp. 3–14 (1995)
Google Scholar
Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR, pp. 37–45 (1998)
Google Scholar
Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa., S.: Efficient substructure discovery from large semi-structured data. In: SDM, pp. 158–174 (2002)
Google Scholar
Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa., S.: Online algorithms for mining semi-structured data stream. In: ICDM, pp. 27–34 (2002)
Google Scholar
Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering frequent substructures in large unordered trees. In: Discovery Science, pp. 47–61 (2003)
Google Scholar
Backstrom, L., Huttenlocher, D.P., Kleinberg, J.M., Lan, X.: Group formation in large social networks: membership, growth, and evolution. In: KDD, pp. 44–54 (2006)
Google Scholar
Batagelj, V., Zaversnik, M.: An O(m) algorithm for cores decomposition of networks. arXiv, preprint cs/0310049 (2003)
Google Scholar
Berger-Wolf, T.Y., Saia, J.: A framework for analysis of dynamic social networks. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 523–528. ACM (2006)
Google Scholar
Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 10008 (2008)
Article Google Scholar
Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On modularity clustering. IEEE Trans. Knowl. Data Eng. 20, 172–188 (2008)
Article Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Networks ISDN Syst. 30(1), 107–117 (1998)
Article Google Scholar
Ceglar, A., Roddick, J.F.: Association mining. ACM Comput. Surv. 38(2), 5 (2006)
Google Scholar
Chen, G., Wu, X., Zhu, X.: Sequential pattern mining in multiple streams. In: ICDM, pp. 27–30 (2005)
Google Scholar
Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: maintaining closed frequent itemsets over a stream sliding window. In: ICDM, pp. 59–66 (2004)
Google Scholar
Chiba, N. Nishizeki, T.: Arboricity and subgraph listing algorithms. SIAM J. Comput. 14(1) 210–223 (1985)
Google Scholar
Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 66111–66117 (2004)
Google Scholar
Cohen, J.D.: Trusses: cohesive subgraphs for social network analysis. National Security Agency Technical Report (2008)
Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT press and McGraw-Hill Book Co., Cambridge (1992)
Google Scholar
Dehaspe, L., Toivonen, H., King, R.D.: Finding frequent substructures in chemical compounds. In: KDD, pp. 30–36 (1998)
Google Scholar
Diestel, R.: Graph Theory. Springer, Heidelberg (2000)
Google Scholar
Ezeife, C.I., Monwar, M.: SSM: a frequent sequential data stream patterns miner. In: CIDM, pp. 120–126 (2007)
Google Scholar
Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: KDD, pp. 150–160 (2000)
Google Scholar
Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.: Self-organization and idenfitication of web communities. IEEE Comput. 35(3), 66–71 (2002)
Article Google Scholar
Freeman, L.C.: A set of measures of cenrrality based upon betweenness. Sociometry 40, 35–41 (1977)
Article Google Scholar
Fu, W., Song, L., Xing, E.P.: Dynamic mixed membership blockmodel for evolving networks. In: Proceedings of the 26th annual international conference on machine learning, pp. 329–336. ACM (2009)
Google Scholar
Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H.: Parameter free bursty events detection in text streams. In: VLDB, pp. 181–192 (2005)
Google Scholar
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. PNAS 99(12), 7821–7826 (2002)
Google Scholar
Goldberg, A.V., Tarjan, R.E.: A new approach to the maximal flow problem. In: STOC, pp. 136–146 (1986)
Google Scholar
Greene, D., Doyle, D., Cunningham, P.: Tracking the evolution of communities in dynamic social networks. In: 2010 international conference on advances in social networks analysis and mining (ASONAM), pp. 176–183. IEEE (2010)
Google Scholar
Hido, S., Kawano, H.: AMIOT: Induced ordered tree mining in tree-structured databases. In: ICDM, pp. 170–177 (2005)
Google Scholar
Ishiguro, K., Iwata, T., Ueda, N., Tenenbaum, J.: Dynamic infinite relational model for time-varying relational data analysis. Adv. Neural Inf. Process. Syst. 23, 919-927 (2010)
Google Scholar
Jiménez, A., Berzal, F., Cubero, J.-C.: Frequent tree pattern mining: a survey. Intell. Data Anal. 14(6), 603–622 (2010)
Google Scholar
Keogh, E., Lonardi, S., Chiu, B.Y.-C.: Finding surprising patterns in a time series database in linear time and space. In: KDD, pp. 550–556 (2002)
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)
Article MathSciNet MATH Google Scholar
Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Extracting large-scale knowledge bases from the web. In: VLDB, pp. 639–650 (1999)
Google Scholar
Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1–3), 458–473 (2008)
Article MathSciNet MATH Google Scholar
Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: KDD, pp. 497–506 (2009)
Google Scholar
Leskovec, J., Horvitz, E.: Planetary-scale views on a large instant-messaging network. In: WWW, pp. 915–924 (2008)
Google Scholar
Li, H.-F., Lee, S.Y.: Miningfrequentitemsetsoverdatastreams using efficient window sliding techniques. Expert Syst. Appl. 36, 1466–1477 (2009)
Article Google Scholar
Li, H.-F., Lee, S.Y., Shan, M.-K.: Online mining (recently) maximal frequent itemsets over data streams. In: RIDE-SDMA, pp. 11–18 (2005)
Google Scholar
Lin, Y.R., Chi, Y., Zhu, S., Sundaram, H., Tseng, B.L.: Facetnet: a framework for analyzing communities and their evolutions in dynamic networks. In: Proceedings of the 17th international conference on World Wide Web, pp. 685–694. ACM (2008)
Google Scholar
Makino, K., Uno, T.: New algorithms for enumerating all maximal cliques. In: SWAT, pp. 260–272 (2004)
Google Scholar
Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: VLDB, pp. 346–357 (2002)
Google Scholar
Mokken, R.J.: Cliques, clubs and clans. Qual. Quant. 13(2), 161–173 (1979)
Article Google Scholar
Moon, J.W., Moser, L.: On cliques in graphs. Isr. J. Math. 3, 23–28 (1965)
Article MathSciNet MATH Google Scholar
Morishita, S.: On classification and regression. In: Discovery Science, pp. 40–57 (1998)
Google Scholar
Nakamura, Y., Horiike, T., Kuboyama, T., Sakamoto, H.: Extracting research communities from bibliographic data. KES J. 16(1), 25–34 (2012)
Google Scholar
Nakano, S., Uno, T.: Efficient generation of rooted trees. Technical report, NII Technical Report NII-2003-005E (2003)
Google Scholar
Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 066133 (2004)
Google Scholar
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)
Article Google Scholar
Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: 1st international workshop on mining graphs, trees, and sequences (MGTS), pp. 55–64 (2003)
Google Scholar
Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: KDD, pp. 647–652 (2004)
Google Scholar
Oates, T., Cohen, P.R.: Searching for structure in multiple streams of data. In: ICML, pp. 346–354 (1996)
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE, pp. 215–224 (2001)
Google Scholar
Pei, J., Han, J., Wang, W.: Constraint-based sequential pattern mining: the pattern-growth methods. J. Intell. Inf. Syst. 28(2), 133–160 (2007)
Article Google Scholar
Qiu, J., Lin, Z.: A framework for exploring organizational structure in dynamic social networks. Decis. Support Syst. 51(4), 760–771 (2011)
Article Google Scholar
Raissi, C., Roncelet, P., Teisseire, M.: SPEED: mining maxirnal sequential patterns over data strearns. In: International IEEE conference on intelligent systems, pp. 546–552 (2006)
Google Scholar
Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: WEA, pp. 606–609 (2005)
Google Scholar
Seidman, S.B.: Network structure and minimum degree. Social Networks 5(3), 269–287 (1983)
Article MathSciNet Google Scholar
Seidman, S.B., Foster, B.L.: A graph-theoretic generalization of the clique concept. J. Math. Soc. 6(1), 139–154 (1978)
Article MathSciNet MATH Google Scholar
Snowsill, T., Nicart, F., Stefani, M., De Bie, T., Cristianini, N.: Finding surprising patterns in textual data streams. In: International workshop on cognitive information processing, pp. 405–410 (2010)
Google Scholar
Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: EDBT, pp. 3–17 (1996)
Google Scholar
Tantipathananandh, C., Berger-Wolf, T., Kempe, D.: A framework for community identification in dynamic social networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 717–726. ACM (2007)
Google Scholar
Tatikonda, S., Parthasarathy, S., Kur, T.M.: Trips and tides: new algorithms for tree mining. In: CIKM, pp. 455–464 (2006)
Google Scholar
Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1), 28–42 (2006)
Article MathSciNet MATH Google Scholar
Tsukiyama, S., Ide, M., Ariyoshi, H., Shirakawa, I.: A new algorithm for generating all the maximal independent sets. SIAM J. Comput., 6, 505–517 (1977)
Google Scholar
Uno, T., Asai, T., Uchida, Y., Arimura, H.: LCM: an efficient algorithm for enumerating frequent closed item sets. In: FIMI (2003)
Google Scholar
Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)
Google Scholar
Wang, N., Zhang, J., Tan, K.-L., Tung., A.K.H.: On triangulation-based dense neighborhood graph discovery. In VLDB, pp. 58–68 (2010)
Google Scholar
Wang, N., Zhang, J., Tan, K.L., Tung, A.K.H.: On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endowment 4(2), 58–68 (2010)
Google Scholar
Wasserman, S., Faust, K.: Social network analysis: methods and applications. Cambridge University Press, Cambridge (1994)
Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest. In: KDD, pp. 71–80 (2002)
Google Scholar
Zaki, M.J.: Efficiently mining frequent embedded unordered trees. Fundam. Inform. 66(1–2), 33–52 (2005)
MathSciNet MATH Google Scholar
Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)
Article Google Scholar
Zaki, M.J., Ogihara, M.: Theoretical foundation of association rules. In: Workshop on data-mining and knowledge discovery (1998)
Google Scholar

Source List

Tutorial
Google Scholar
Social and Information Network Analysis Course http://www.stanford.edu/class/cs224w/index.html
Material
Google Scholar
Stanford Large Network Dataset Collection http://snap.stanford.edu/data
Citation networks http://dblp.uni-trier.de/xml
Internet topology http://topology.eecs.umich.edu/data.html
Youtube http://netsg.cs.sfu.ca/youtubedata
Amazon http://snap.stanford.edu/data/amazon-meta.html
Wikipedia http://users.on.net/~henry/home/wikipedia.htm
Newman’s pointers http://www-personal.umich.edu/~mejn/netdata
Mining Program Source
Google Scholar
LCM http://research.nii.ac.jp/~uno/code/lcm.html
LCM for sequential mining http://research.nii.ac.jp/~uno/code/lcm_seq.html
FREQT http://research.nii.ac.jp/~uno/code/FREQT_distMay02_j50.tar.gz
Max clique by Makino and Uno http://research.nii.ac.jp/~uno/code/mace.html
Max clique by Tomita et al. http://research.nii.ac.jp/~uno/code/macego10.zip
Social Network Analysis Tools
Google Scholar
Gephi http://gephi.org
Network Workbench http://nwb.cns.iu.edu
Pajek http://pajek.imfm.si
igraph http://igraph.sourceforge.net
Others (list of tools) http://en.wikipedia.org/wiki/Social_network_analysis_software

Download references

Author information

Authors and Affiliations

Kyushu Institute of Technology, 680-4 Kawazu, Iizuka-shi, Fukuoka, 820-8502, Japan
Hiroshi Sakamoto
Gakushuin University, 1-5-1 Mejiro, Toshima-ku, Tokyo, 171-8588, Japan
Tetsuji Kuboyama
PRESTO JST, 4-1-8 Honcho, Kawaguchi, Saitama, 332-0012, Japan
Hiroshi Sakamoto

Authors

Hiroshi Sakamoto
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuji Kuboyama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroshi Sakamoto .

Editor information

Editors and Affiliations

, Department of Informatics, University of Piraeus, Karaoli&Dimitriou St. 80, Piraeus, 18534, Greece
George A. Tsihrintzis
, Department of Informatics, University of Piraeus, Karaoli&Dimitriou St. 80, Piraeus, 18534, Greece
Maria Virvou
, School of Electrical and, University of South Australia, Mawson Lakes Campus, Adelaide, 5095, South Australia, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sakamoto, H., Kuboyama, T. (2013). Pattern Extraction from Graphs and Beyond. In: Tsihrintzis, G., Virvou, M., Jain, L. (eds) Multimedia Services in Intelligent Environments. Smart Innovation, Systems and Technologies, vol 24. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00372-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-00372-6_7
Published: 24 May 2013
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00371-9
Online ISBN: 978-3-319-00372-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics