Abstract
Graph mining has practical applications in many areas such as molecular substructure explorer, web link analysis, fraud detection, outlier detection, chemical molecules, and social networks. Frequent subgraph mining is an important topic of graph mining. The mining process is to find all frequent subgraphs over a collection of graphs. Numerous algorithms for mining frequent subgraphs have been proposed; most of them, however, used sequential strategies which are not scalable on large datasets. In this paper, we propose a parallel algorithm to overcome this weakness. Firstly, the multi-core processor architecture is introduced; the way to apply it to data mining is also discussed. Secondly, we present the gSpan algorithm as the basic framework of our algorithm. Finally, we develop an efficient algorithm for mining frequent subgraphs relied on parallel computing. The performance and scalability of the proposed algorithm is illustrated through extensive experiments on two datasets, chemical and compound.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Nijssen, S., Kok, J.: Frequent graph mining and its application to molecular databases. In: The IEEE International Conference on Systems, Man and Cybernetics (SMC 2004), pp. 4571–4577 (2004)
Punin, J.R., Krishnamoorthy, M.S., Zaki, M.J.: LOGML: Log markup language for web usage mining. In: Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.) WebKDD 2001. LNCS (LNAI), vol. 2356, pp. 88–112. Springer, Heidelberg (2002)
Eberle, W., Holder, L.: Anomaly detection in data represented as graphs. Intelligent Data Analysis 11, 663–689 (2007)
Dehaspe, L., Toivonen, H., King, R.: Finding Frequent Substructures in Chemical Compounds. In: KDD, pp. 30–36 (1998)
Nettleton, D.: Data mining of social networks represented as graphs. Computer Science Review 7, 1–34 (2013)
Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: The IEEE International Conference on Data Mining (ICDM 2002), pp. 721–724 (2002)
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: The IEEE International Conference on Data Mining (ICDM 2003), pp. 549–552 (2003)
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: The IEEE International Conference on Data Mining (ICDM 2001), pp. 313-320. (2001)
Gago Alonso, A., Medina Pagola, J.E., Carrasco-Ochoa, J.A., MartÃnez-Trinidad, J.F.: Mining frequent connected subgraphs reducing the number of candidates. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part I. LNCS (LNAI), vol. 5211, pp. 365–376. Springer, Heidelberg (2008)
Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000)
Casali, A., Ernst, C.: Extracting Correlated Patterns on Multicore Architectures. In: Cuzzocrea, A., Kittl, C., Simos, D.E., Weippl, E., Xu, L. (eds.) CD-ARES 2013. LNCS, vol. 8127, pp. 118–133. Springer, Heidelberg (2013)
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: The 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc. (1994)
Vanetik, N., Gudes, E., Shimony, S.: Computing frequent graph patterns from semistructured data. In: The IEEE International Conference on Data Mining (ICDM 2002), pp. 458–465. IEEE (2002)
Nguyen, P.C., Washio, T., Ohara, K., Motoda, H.: Using a hash-based method for apriori-based graph mining. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 349–361. Springer, Heidelberg (2004)
Ribeiro, P., Silva, F.: G-Tries: a data structure for storing and finding subgraphs. Data Mining and Knowledge Discovery 28, 337–377 (2014)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD Record, pp. 1–12. ACM (2000)
Schlegel, B., Karnagel, T., Kiefer, T., Lehner, W.: Scalable frequent itemset mining on many-core processors. In: The 9th International Workshop on Data Management on New Hardware, Article No. 3. ACM (2013)
Nguyen, D., Vo, B., Le, B.: Efficient Strategies for Parallel Mining Class Association Rules. Expert Systems with Applications 41, 4716–4729 (2014)
Zhang, F., Zhang, Y., Bakos, J.D.: Accelerating frequent itemset mining on graphics processing units. The Journal of Supercomputing 66, 94–117 (2013)
Cook, D., Holder, L., Galal, G., Maglothin, R.: Approaches to parallel graph-based knowledge discovery. Journal of Parallel and Distributed Computing 61, 427–446 (2001)
Buehrer, G., Parthasarathy, S., Nguyen, A., Kim, D., Chen, Y.-K., Dubey, P.: Parallel Graph Mining on Shared Memory Architectures. Technical report, Columbus, OH, USA (2005)
Kessl, R., Talukder, N., Anchuri, P., Zaki, M.: Parallel Graph Mining with GPUs. In: The 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 1–16 (2014)
Lin, W., Xiao, X., Ghinita, G.: Large-scale frequent subgraph mining in MapReduce. In: The IEEE 30th International Conference on Data Engineering (ICDE 2014), pp. 844–855. IEEE (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Vo, B., Nguyen, D., Nguyen, TL. (2015). A Parallel Algorithm for Frequent Subgraph Mining. In: Le Thi, H., Nguyen, N., Do, T. (eds) Advanced Computational Methods for Knowledge Engineering. Advances in Intelligent Systems and Computing, vol 358. Springer, Cham. https://doi.org/10.1007/978-3-319-17996-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-17996-4_15
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17995-7
Online ISBN: 978-3-319-17996-4
eBook Packages: EngineeringEngineering (R0)