Skip to main content

Pattern Extraction from Graphs and Beyond

  • Chapter
  • First Online:
Book cover Multimedia Services in Intelligent Environments

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 24))

  • 928 Accesses

Abstract

We explain recent studies on pattern extraction from large-scale graphs. Patterns are represented explicitly and implicitly. Explicit patterns are concrete subgraphs defined in graph theory, e.g., clique and tree. For an explicit model of patterns, we introduce notable fast algorithms for finding all frequent patterns. We also confirm that these problems are closely related to traditional problems in data mining. On the other hand, implicit patterns are defined by statistical factors, e.g., modularity, betweenness, and flow determining optimal hidden subgraphs. For both models, we give an introductory survey focusing on notable pattern extraction algorithms.

Partially supported by KAKENHI(23680016, 20589824) and JST PRESTO program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    There are other measures to define betweenness instead of shortest path. The shortest path is, however, easier to compute than other measures.

  2. 2.

    Assume any edge has unit weight in web graph.

  3. 3.

    For other classes of trees, see e.g., [33].

References

  1. Abe, K., Kawasoe, S., Asai, T., Arimura, H., Arikawa, S.: Optimized substructure discovery for semi-structured data. In: PKDD, pp. 1–14 (2002)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Mining sequential patterns. In: ICDE, pp. 3–14 (1995)

    Google Scholar 

  3. Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR, pp. 37–45 (1998)

    Google Scholar 

  4. Asai, T., Abe, K., Kawasoe, S., Arimura, H., Sakamoto, H., Arikawa., S.: Efficient substructure discovery from large semi-structured data. In: SDM, pp. 158–174 (2002)

    Google Scholar 

  5. Asai, T., Arimura, H., Abe, K., Kawasoe, S., Arikawa., S.: Online algorithms for mining semi-structured data stream. In: ICDM, pp. 27–34 (2002)

    Google Scholar 

  6. Asai, T., Arimura, H., Uno, T., Nakano, S.: Discovering frequent substructures in large unordered trees. In: Discovery Science, pp. 47–61 (2003)

    Google Scholar 

  7. Backstrom, L., Huttenlocher, D.P., Kleinberg, J.M., Lan, X.: Group formation in large social networks: membership, growth, and evolution. In: KDD, pp. 44–54 (2006)

    Google Scholar 

  8. Batagelj, V., Zaversnik, M.: An O(m) algorithm for cores decomposition of networks. arXiv, preprint cs/0310049 (2003)

    Google Scholar 

  9. Berger-Wolf, T.Y., Saia, J.: A framework for analysis of dynamic social networks. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 523–528. ACM (2006)

    Google Scholar 

  10. Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 10008 (2008)

    Article  Google Scholar 

  11. Brandes, U., Delling, D., Gaertler, M., Gorke, R., Hoefer, M., Nikoloski, Z., Wagner, D.: On modularity clustering. IEEE Trans. Knowl. Data Eng. 20, 172–188 (2008)

    Article  Google Scholar 

  12. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Networks ISDN Syst. 30(1), 107–117 (1998)

    Article  Google Scholar 

  13. Ceglar, A., Roddick, J.F.: Association mining. ACM Comput. Surv. 38(2), 5 (2006)

    Google Scholar 

  14. Chen, G., Wu, X., Zhu, X.: Sequential pattern mining in multiple streams. In: ICDM, pp. 27–30 (2005)

    Google Scholar 

  15. Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: maintaining closed frequent itemsets over a stream sliding window. In: ICDM, pp. 59–66 (2004)

    Google Scholar 

  16. Chiba, N. Nishizeki, T.: Arboricity and subgraph listing algorithms. SIAM J. Comput. 14(1) 210–223 (1985)

    Google Scholar 

  17. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70, 66111–66117 (2004)

    Google Scholar 

  18. Cohen, J.D.: Trusses: cohesive subgraphs for social network analysis. National Security Agency Technical Report (2008)

    Google Scholar 

  19. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT press and McGraw-Hill Book Co., Cambridge (1992)

    Google Scholar 

  20. Dehaspe, L., Toivonen, H., King, R.D.: Finding frequent substructures in chemical compounds. In: KDD, pp. 30–36 (1998)

    Google Scholar 

  21. Diestel, R.: Graph Theory. Springer, Heidelberg (2000)

    Google Scholar 

  22. Ezeife, C.I., Monwar, M.: SSM: a frequent sequential data stream patterns miner. In: CIDM, pp. 120–126 (2007)

    Google Scholar 

  23. Flake, G.W., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: KDD, pp. 150–160 (2000)

    Google Scholar 

  24. Flake, G.W., Lawrence, S., Giles, C.L., Coetzee, F.: Self-organization and idenfitication of web communities. IEEE Comput. 35(3), 66–71 (2002)

    Article  Google Scholar 

  25. Freeman, L.C.: A set of measures of cenrrality based upon betweenness. Sociometry 40, 35–41 (1977)

    Article  Google Scholar 

  26. Fu, W., Song, L., Xing, E.P.: Dynamic mixed membership blockmodel for evolving networks. In: Proceedings of the 26th annual international conference on machine learning, pp. 329–336. ACM (2009)

    Google Scholar 

  27. Fung, G.P.C., Yu, J.X., Yu, P.S., Lu, H.: Parameter free bursty events detection in text streams. In: VLDB, pp. 181–192 (2005)

    Google Scholar 

  28. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. PNAS 99(12), 7821–7826 (2002)

    Google Scholar 

  29. Goldberg, A.V., Tarjan, R.E.: A new approach to the maximal flow problem. In: STOC, pp. 136–146 (1986)

    Google Scholar 

  30. Greene, D., Doyle, D., Cunningham, P.: Tracking the evolution of communities in dynamic social networks. In: 2010 international conference on advances in social networks analysis and mining (ASONAM), pp. 176–183. IEEE (2010)

    Google Scholar 

  31. Hido, S., Kawano, H.: AMIOT: Induced ordered tree mining in tree-structured databases. In: ICDM, pp. 170–177 (2005)

    Google Scholar 

  32. Ishiguro, K., Iwata, T., Ueda, N., Tenenbaum, J.: Dynamic infinite relational model for time-varying relational data analysis. Adv. Neural Inf. Process. Syst. 23, 919-927 (2010)

    Google Scholar 

  33. Jiménez, A., Berzal, F., Cubero, J.-C.: Frequent tree pattern mining: a survey. Intell. Data Anal. 14(6), 603–622 (2010)

    Google Scholar 

  34. Keogh, E., Lonardi, S., Chiu, B.Y.-C.: Finding surprising patterns in a time series database in linear time and space. In: KDD, pp. 550–556 (2002)

    Google Scholar 

  35. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  36. Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Extracting large-scale knowledge bases from the web. In: VLDB, pp. 639–650 (1999)

    Google Scholar 

  37. Latapy, M.: Main-memory triangle computations for very large (sparse (power-law)) graphs. Theor. Comput. Sci. 407(1–3), 458–473 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  38. Leskovec, J., Backstrom, L., Kleinberg, J.: Meme-tracking and the dynamics of the news cycle. In: KDD, pp. 497–506 (2009)

    Google Scholar 

  39. Leskovec, J., Horvitz, E.: Planetary-scale views on a large instant-messaging network. In: WWW, pp. 915–924 (2008)

    Google Scholar 

  40. Li, H.-F., Lee, S.Y.: Miningfrequentitemsetsoverdatastreams using efficient window sliding techniques. Expert Syst. Appl. 36, 1466–1477 (2009)

    Article  Google Scholar 

  41. Li, H.-F., Lee, S.Y., Shan, M.-K.: Online mining (recently) maximal frequent itemsets over data streams. In: RIDE-SDMA, pp. 11–18 (2005)

    Google Scholar 

  42. Lin, Y.R., Chi, Y., Zhu, S., Sundaram, H., Tseng, B.L.: Facetnet: a framework for analyzing communities and their evolutions in dynamic networks. In: Proceedings of the 17th international conference on World Wide Web, pp. 685–694. ACM (2008)

    Google Scholar 

  43. Makino, K., Uno, T.: New algorithms for enumerating all maximal cliques. In: SWAT, pp. 260–272 (2004)

    Google Scholar 

  44. Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: VLDB, pp. 346–357 (2002)

    Google Scholar 

  45. Mokken, R.J.: Cliques, clubs and clans. Qual. Quant. 13(2), 161–173 (1979)

    Article  Google Scholar 

  46. Moon, J.W., Moser, L.: On cliques in graphs. Isr. J. Math. 3, 23–28 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  47. Morishita, S.: On classification and regression. In: Discovery Science, pp. 40–57 (1998)

    Google Scholar 

  48. Nakamura, Y., Horiike, T., Kuboyama, T., Sakamoto, H.: Extracting research communities from bibliographic data. KES J. 16(1), 25–34 (2012)

    Google Scholar 

  49. Nakano, S., Uno, T.: Efficient generation of rooted trees. Technical report, NII Technical Report NII-2003-005E (2003)

    Google Scholar 

  50. Newman, M.E.J.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69, 066133 (2004)

    Google Scholar 

  51. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)

    Article  Google Scholar 

  52. Nijssen, S., Kok, J.N.: Efficient discovery of frequent unordered trees. In: 1st international workshop on mining graphs, trees, and sequences (MGTS), pp. 55–64 (2003)

    Google Scholar 

  53. Nijssen, S., Kok, J.N.: A quickstart in frequent structure mining can make a difference. In: KDD, pp. 647–652 (2004)

    Google Scholar 

  54. Oates, T., Cohen, P.R.: Searching for structure in multiple streams of data. In: ICML, pp. 346–354 (1996)

    Google Scholar 

  55. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE, pp. 215–224 (2001)

    Google Scholar 

  56. Pei, J., Han, J., Wang, W.: Constraint-based sequential pattern mining: the pattern-growth methods. J. Intell. Inf. Syst. 28(2), 133–160 (2007)

    Article  Google Scholar 

  57. Qiu, J., Lin, Z.: A framework for exploring organizational structure in dynamic social networks. Decis. Support Syst. 51(4), 760–771 (2011)

    Article  Google Scholar 

  58. Raissi, C., Roncelet, P., Teisseire, M.: SPEED: mining maxirnal sequential patterns over data strearns. In: International IEEE conference on intelligent systems, pp. 546–552 (2006)

    Google Scholar 

  59. Schank, T., Wagner, D.: Finding, counting and listing all triangles in large graphs, an experimental study. In: WEA, pp. 606–609 (2005)

    Google Scholar 

  60. Seidman, S.B.: Network structure and minimum degree. Social Networks 5(3), 269–287 (1983)

    Article  MathSciNet  Google Scholar 

  61. Seidman, S.B., Foster, B.L.: A graph-theoretic generalization of the clique concept. J. Math. Soc. 6(1), 139–154 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  62. Snowsill, T., Nicart, F., Stefani, M., De Bie, T., Cristianini, N.: Finding surprising patterns in textual data streams. In: International workshop on cognitive information processing, pp. 405–410 (2010)

    Google Scholar 

  63. Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: EDBT, pp. 3–17 (1996)

    Google Scholar 

  64. Tantipathananandh, C., Berger-Wolf, T., Kempe, D.: A framework for community identification in dynamic social networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 717–726. ACM (2007)

    Google Scholar 

  65. Tatikonda, S., Parthasarathy, S., Kur, T.M.: Trips and tides: new algorithms for tree mining. In: CIKM, pp. 455–464 (2006)

    Google Scholar 

  66. Tomita, E., Tanaka, A., Takahashi, H.: The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci. 363(1), 28–42 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  67. Tsukiyama, S., Ide, M., Ariyoshi, H., Shirakawa, I.: A new algorithm for generating all the maximal independent sets. SIAM J. Comput., 6, 505–517 (1977)

    Google Scholar 

  68. Uno, T., Asai, T., Uchida, Y., Arimura, H.: LCM: an efficient algorithm for enumerating frequent closed item sets. In: FIMI (2003)

    Google Scholar 

  69. Wang, J., Cheng, J.: Truss decomposition in massive networks. PVLDB 5(9), 812–823 (2012)

    Google Scholar 

  70. Wang, N., Zhang, J., Tan, K.-L., Tung., A.K.H.: On triangulation-based dense neighborhood graph discovery. In VLDB, pp. 58–68 (2010)

    Google Scholar 

  71. Wang, N., Zhang, J., Tan, K.L., Tung, A.K.H.: On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endowment 4(2), 58–68 (2010)

    Google Scholar 

  72. Wasserman, S., Faust, K.: Social network analysis: methods and applications. Cambridge University Press, Cambridge (1994)

    Google Scholar 

  73. Zaki, M.J.: Efficiently mining frequent trees in a forest. In: KDD, pp. 71–80 (2002)

    Google Scholar 

  74. Zaki, M.J.: Efficiently mining frequent embedded unordered trees. Fundam. Inform. 66(1–2), 33–52 (2005)

    MathSciNet  MATH  Google Scholar 

  75. Zaki, M.J.: Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans. Knowl. Data Eng. 17(8), 1021–1035 (2005)

    Article  Google Scholar 

  76. Zaki, M.J., Ogihara, M.: Theoretical foundation of association rules. In: Workshop on data-mining and knowledge discovery (1998)

    Google Scholar 

Source List

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroshi Sakamoto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Sakamoto, H., Kuboyama, T. (2013). Pattern Extraction from Graphs and Beyond. In: Tsihrintzis, G., Virvou, M., Jain, L. (eds) Multimedia Services in Intelligent Environments. Smart Innovation, Systems and Technologies, vol 24. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00372-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-00372-6_7

  • Published:

  • Publisher Name: Springer, Heidelberg

  • Print ISBN: 978-3-319-00371-9

  • Online ISBN: 978-3-319-00372-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics