Skip to main content

Partition and Conquer: Map/Reduce Way of Substructure Discovery

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9263))

Abstract

Transactional data mining (decision trees, association rules etc.) has been used to discover non trivial patterns in unstructured data. For applications that have an inherent structure (such as social networks, phone networks etc.) graph mining is useful as mapping such data into an unstructured representation will lead to loss of relationships. Graph mining finds use in a plethora of applications: analysis of fraud detection in transaction networks, finding friendships and other characteristics are to name a few. Finding interesting and frequent substructures is central to graph mining in all of these applications. Until now, graph mining has been addressed using main memory, disk-based as well as database-oriented approaches to deal with progressively larger sizes of applications.

This paper presents two algorithms using the Map/Reduce paradigm for mining interesting and repetitive patterns from a partitioned input graph. A general form of graphs, including directed edges and cycles are handled by our approach. Our primary goal is to address scalability, solve difficult and computationally expensive problems like duplicate elimination, canonical labeling and isomorphism detection in the Map/Reduce framework, without loss of information. Our analysis and experiments show that graphs with hundreds of millions of edges can be handled with acceptable speedup by the algorithm and the approach presented in this paper.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The canonical k-edge substructure can not distinguish pathological substructures where all node and edge labels in a single substructure are identical and hence cannot distinguish bigger substructures which have a partial substructure with identical labels.

  2. 2.

    This measure returns a superset of substructures using MIS and we can use MIS with additional cost (if needed) over it.

References

  1. http://giraph.apache.org/

  2. Foto, N., Afrati, D.F., Jeffrey, D.U.: Enumerating subgraph instances using map-reduce. Technical report, Stanford University, December 2011

    Google Scholar 

  3. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Very Large Data Bases, pp. 487–499 (1994)

    Google Scholar 

  4. Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis, D.: On storing voluminous RDF descriptions: the case of web portal catalogs. In: International Workshop on the Web and Databases, pp. 43–48 (2001)

    Google Scholar 

  5. Bringmann, B., Nijssen, S.: What is frequent in a single graph? In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 858–863. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  6. Chakravarthy, S., Pradhan, S.: DB-FSG: an SQL-based approach for frequent subgraph mining. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2008. LNCS, vol. 5181, pp. 684–692. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  7. Das, S., Chakravarthy, S.: Challenges and approaches for large graph analysis using map/reduce paradigm. In: Bhatnagar, V., Srinivasa, S. (eds.) BDA 2013. LNCS, vol. 8302, pp. 116–132. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  8. Deshpande, M., Kuramochi, M., Karypis, G.: Frequent sub-structure-based approaches for classifying chemical compounds. In: IEEE International Conference on Data Mining, pp. 35–42 (2003)

    Google Scholar 

  9. Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: GRAMI: frequent subgraph and pattern mining in a single large graph. PVLDB 7(7), 517–528 (2014)

    Google Scholar 

  10. Fiedler, M., Borgelt, C.: Subgraph support in a single large graph. In: ICDM Workshops, pp. 399–404 (2007)

    Google Scholar 

  11. Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1), 55–86 (2007)

    Article  MathSciNet  Google Scholar 

  12. Hill, S., Srichandan, B., Sunderraman, R.: An iterative mapreduce approach to frequent subgraph mining in biological datasets. In: BCB, pp. 661–666 (2012)

    Google Scholar 

  13. Holder, L.B., Cook, D.J., Djoko, S.: Substucture Discovery in the SUBDUE System. In: Knowledge Discovery and Data Mining, pp. 169–180 (1994)

    Google Scholar 

  14. Jiang, C., Coenen, F., Zito, M.: A survey of frequent subgraph mining algorithms. Knowl. Eng. Rev. 28(1), 75–105 (2013)

    Article  Google Scholar 

  15. Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., Banich, B.: Knowledge discovery from transportation network data. In: ICDE 2005, pp. 1061–1072, April 2005

    Google Scholar 

  16. Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48, 96–129 (1998). Elsevier

    Article  Google Scholar 

  17. Lin, J., Schatz, M.: Design patterns for efficient graph algorithms in MapReduce. In: 16th ACM SIGKDD Conference, pp. 78–85 (2010)

    Google Scholar 

  18. Liu, Y., Jiang, X., Chen, H., Ma, J., Zhang, X.: MapReduce-based pattern finding algorithm applied in motif detection for prescription compatibility network. In: Advanced Parallel Programming Technologies, pp. 341–355 (2009)

    Google Scholar 

  19. Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP 2013, pp. 439–455 (2013)

    Google Scholar 

  20. Padmanabhan, S., Chakravarthy, S.: HDB-subdue: a scalable approach to graph mining. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 325–338. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  21. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE, pp. 215–224 (2001)

    Google Scholar 

  22. Rathi, R., Cook, D.J., Holder, L.B.: A serial partitioning approach to scaling graph-based knowledge discovery. In: Russell, I., Markov, Z. (eds.) FLAIRS Conference, pp. 188–193. AAAI Press, Menlo Park (2005)

    Google Scholar 

  23. Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: IEEE International Conference on Data Mining, pp. 721–724 (2002)

    Google Scholar 

  24. Zaharia, M., Chowdhury, M., Franklin, M., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soumyava Das .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Das, S., Chakravarthy, S. (2015). Partition and Conquer: Map/Reduce Way of Substructure Discovery. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2015. Lecture Notes in Computer Science(), vol 9263. Springer, Cham. https://doi.org/10.1007/978-3-319-22729-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22729-0_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22728-3

  • Online ISBN: 978-3-319-22729-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics