Abstract
Transactional data mining (decision trees, association rules etc.) has been used to discover non trivial patterns in unstructured data. For applications that have an inherent structure (such as social networks, phone networks etc.) graph mining is useful as mapping such data into an unstructured representation will lead to loss of relationships. Graph mining finds use in a plethora of applications: analysis of fraud detection in transaction networks, finding friendships and other characteristics are to name a few. Finding interesting and frequent substructures is central to graph mining in all of these applications. Until now, graph mining has been addressed using main memory, disk-based as well as database-oriented approaches to deal with progressively larger sizes of applications.
This paper presents two algorithms using the Map/Reduce paradigm for mining interesting and repetitive patterns from a partitioned input graph. A general form of graphs, including directed edges and cycles are handled by our approach. Our primary goal is to address scalability, solve difficult and computationally expensive problems like duplicate elimination, canonical labeling and isomorphism detection in the Map/Reduce framework, without loss of information. Our analysis and experiments show that graphs with hundreds of millions of edges can be handled with acceptable speedup by the algorithm and the approach presented in this paper.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The canonical k-edge substructure can not distinguish pathological substructures where all node and edge labels in a single substructure are identical and hence cannot distinguish bigger substructures which have a partial substructure with identical labels.
- 2.
This measure returns a superset of substructures using MIS and we can use MIS with additional cost (if needed) over it.
References
Foto, N., Afrati, D.F., Jeffrey, D.U.: Enumerating subgraph instances using map-reduce. Technical report, Stanford University, December 2011
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Very Large Data Bases, pp. 487–499 (1994)
Alexaki, S., Christophides, V., Karvounarakis, G., Plexousakis, D.: On storing voluminous RDF descriptions: the case of web portal catalogs. In: International Workshop on the Web and Databases, pp. 43–48 (2001)
Bringmann, B., Nijssen, S.: What is frequent in a single graph? In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 858–863. Springer, Heidelberg (2008)
Chakravarthy, S., Pradhan, S.: DB-FSG: an SQL-based approach for frequent subgraph mining. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2008. LNCS, vol. 5181, pp. 684–692. Springer, Heidelberg (2008)
Das, S., Chakravarthy, S.: Challenges and approaches for large graph analysis using map/reduce paradigm. In: Bhatnagar, V., Srinivasa, S. (eds.) BDA 2013. LNCS, vol. 8302, pp. 116–132. Springer, Heidelberg (2013)
Deshpande, M., Kuramochi, M., Karypis, G.: Frequent sub-structure-based approaches for classifying chemical compounds. In: IEEE International Conference on Data Mining, pp. 35–42 (2003)
Elseidy, M., Abdelhamid, E., Skiadopoulos, S., Kalnis, P.: GRAMI: frequent subgraph and pattern mining in a single large graph. PVLDB 7(7), 517–528 (2014)
Fiedler, M., Borgelt, C.: Subgraph support in a single large graph. In: ICDM Workshops, pp. 399–404 (2007)
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1), 55–86 (2007)
Hill, S., Srichandan, B., Sunderraman, R.: An iterative mapreduce approach to frequent subgraph mining in biological datasets. In: BCB, pp. 661–666 (2012)
Holder, L.B., Cook, D.J., Djoko, S.: Substucture Discovery in the SUBDUE System. In: Knowledge Discovery and Data Mining, pp. 169–180 (1994)
Jiang, C., Coenen, F., Zito, M.: A survey of frequent subgraph mining algorithms. Knowl. Eng. Rev. 28(1), 75–105 (2013)
Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., Banich, B.: Knowledge discovery from transportation network data. In: ICDE 2005, pp. 1061–1072, April 2005
Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48, 96–129 (1998). Elsevier
Lin, J., Schatz, M.: Design patterns for efficient graph algorithms in MapReduce. In: 16th ACM SIGKDD Conference, pp. 78–85 (2010)
Liu, Y., Jiang, X., Chen, H., Ma, J., Zhang, X.: MapReduce-based pattern finding algorithm applied in motif detection for prescription compatibility network. In: Advanced Parallel Programming Technologies, pp. 341–355 (2009)
Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP 2013, pp. 439–455 (2013)
Padmanabhan, S., Chakravarthy, S.: HDB-subdue: a scalable approach to graph mining. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 325–338. Springer, Heidelberg (2009)
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICDE, pp. 215–224 (2001)
Rathi, R., Cook, D.J., Holder, L.B.: A serial partitioning approach to scaling graph-based knowledge discovery. In: Russell, I., Markov, Z. (eds.) FLAIRS Conference, pp. 188–193. AAAI Press, Menlo Park (2005)
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: IEEE International Conference on Data Mining, pp. 721–724 (2002)
Zaharia, M., Chowdhury, M., Franklin, M., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Das, S., Chakravarthy, S. (2015). Partition and Conquer: Map/Reduce Way of Substructure Discovery. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2015. Lecture Notes in Computer Science(), vol 9263. Springer, Cham. https://doi.org/10.1007/978-3-319-22729-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-22729-0_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22728-3
Online ISBN: 978-3-319-22729-0
eBook Packages: Computer ScienceComputer Science (R0)