Partition and Conquer: Map/Reduce Way of Substructure Discovery

Das, Soumyava; Chakravarthy, Sharma

doi:10.1007/978-3-319-22729-0_28

Partition and Conquer: Map/Reduce Way of Substructure Discovery

Soumyava Das¹⁵ &
Sharma Chakravarthy¹⁵

Conference paper
First Online: 01 January 2015

1729 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9263))

Abstract

Transactional data mining (decision trees, association rules etc.) has been used to discover non trivial patterns in unstructured data. For applications that have an inherent structure (such as social networks, phone networks etc.) graph mining is useful as mapping such data into an unstructured representation will lead to loss of relationships. Graph mining finds use in a plethora of applications: analysis of fraud detection in transaction networks, finding friendships and other characteristics are to name a few. Finding interesting and frequent substructures is central to graph mining in all of these applications. Until now, graph mining has been addressed using main memory, disk-based as well as database-oriented approaches to deal with progressively larger sizes of applications.

This paper presents two algorithms using the Map/Reduce paradigm for mining interesting and repetitive patterns from a partitioned input graph. A general form of graphs, including directed edges and cycles are handled by our approach. Our primary goal is to address scalability, solve difficult and computationally expensive problems like duplicate elimination, canonical labeling and isomorphism detection in the Map/Reduce framework, without loss of information. Our analysis and experiments show that graphs with hundreds of millions of edges can be handled with acceptable speedup by the algorithm and the approach presented in this paper.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The canonical k-edge substructure can not distinguish pathological substructures where all node and edge labels in a single substructure are identical and hence cannot distinguish bigger substructures which have a partial substructure with identical labels.
2.
This measure returns a superset of substructures using MIS and we can use MIS with additional cost (if needed) over it.

References

http://giraph.apache.org/
Foto, N., Afrati, D.F., Jeffrey, D.U.: Enumerating subgraph instances using map-reduce. Technical report, Stanford University, December 2011
Google Scholar
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Very Large Data Bases, pp. 487–499 (1994)
Google Scholar
Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis, D.: On storing voluminous RDF descriptions: the case of web portal catalogs. In: International Workshop on the Web and Databases, pp. 43–48 (2001)
Google Scholar
Bringmann, B., Nijssen, S.: What is frequent in a single graph? In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 858–863. Springer, Heidelberg (2008)
Chapter Google Scholar
Chakravarthy, S., Pradhan, S.: DB-FSG: an SQL-based approach for frequent subgraph mining. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2008. LNCS, vol. 5181, pp. 684–692. Springer, Heidelberg (2008)
Chapter Google Scholar
Das, S., Chakravarthy, S.: Challenges and approaches for large graph analysis using map/reduce paradigm. In: Bhatnagar, V., Srinivasa, S. (eds.) BDA 2013. LNCS, vol. 8302, pp. 116–132. Springer, Heidelberg (2013)
Chapter Google Scholar
Deshpande, M., Kuramochi, M., Karypis, G.: Frequent sub-structure-based approaches for classifying chemical compounds. In: IEEE International Conference on Data Mining, pp. 35–42 (2003)
Google Scholar
Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: GRAMI: frequent subgraph and pattern mining in a single large graph. PVLDB 7(7), 517–528 (2014)
Google Scholar
Fiedler, M., Borgelt, C.: Subgraph support in a single large graph. In: ICDM Workshops, pp. 399–404 (2007)
Google Scholar
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1), 55–86 (2007)
Article MathSciNet Google Scholar
Hill, S., Srichandan, B., Sunderraman, R.: An iterative mapreduce approach to frequent subgraph mining in biological datasets. In: BCB, pp. 661–666 (2012)
Google Scholar
Holder, L.B., Cook, D.J., Djoko, S.: Substucture Discovery in the SUBDUE System. In: Knowledge Discovery and Data Mining, pp. 169–180 (1994)
Google Scholar
Jiang, C., Coenen, F., Zito, M.: A survey of frequent subgraph mining algorithms. Knowl. Eng. Rev. 28(1), 75–105 (2013)
Article Google Scholar
Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., Banich, B.: Knowledge discovery from transportation network data. In: ICDE 2005, pp. 1061–1072, April 2005
Google Scholar
Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48, 96–129 (1998). Elsevier
Article Google Scholar
Lin, J., Schatz, M.: Design patterns for efficient graph algorithms in MapReduce. In: 16th ACM SIGKDD Conference, pp. 78–85 (2010)
Google Scholar
Liu, Y., Jiang, X., Chen, H., Ma, J., Zhang, X.: MapReduce-based pattern finding algorithm applied in motif detection for prescription compatibility network. In: Advanced Parallel Programming Technologies, pp. 341–355 (2009)
Google Scholar
Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP 2013, pp. 439–455 (2013)
Google Scholar
Padmanabhan, S., Chakravarthy, S.: HDB-subdue: a scalable approach to graph mining. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 325–338. Springer, Heidelberg (2009)
Chapter Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE, pp. 215–224 (2001)
Google Scholar
Rathi, R., Cook, D.J., Holder, L.B.: A serial partitioning approach to scaling graph-based knowledge discovery. In: Russell, I., Markov, Z. (eds.) FLAIRS Conference, pp. 188–193. AAAI Press, Menlo Park (2005)
Google Scholar
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: IEEE International Conference on Data Mining, pp. 721–724 (2002)
Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

ITLAB and CSE Department, University of Texas at Arlington, Arlington, Texas
Soumyava Das & Sharma Chakravarthy

Authors

Soumyava Das
View author publications
You can also search for this author in PubMed Google Scholar
Sharma Chakravarthy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soumyava Das .

Editor information

Editors and Affiliations

University of Science and Technology, Rolla, Missouri, USA
Sanjay Madria
Osaka University, Osaka, Japan
Takahiro Hara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Das, S., Chakravarthy, S. (2015). Partition and Conquer: Map/Reduce Way of Substructure Discovery. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2015. Lecture Notes in Computer Science(), vol 9263. Springer, Cham. https://doi.org/10.1007/978-3-319-22729-0_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-22729-0_28
Published: 05 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22728-3
Online ISBN: 978-3-319-22729-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics