Prefix Tree Based MapReduce Approach for Mining Frequent Subgraphs

Movva, Supriya; Prata, Saketh; Sampath, Sai; Gayathri, R. G.

doi:10.1007/978-3-030-20615-4_17

Supriya Movva¹⁷,
Saketh Prata¹⁷,
Sai Sampath¹⁷ &
…
R. G. Gayathri¹⁷

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 276))

Included in the following conference series:

International Conference on Ubiquitous Communications and Network Computing

496 Accesses

Abstract

The frequent subgraphs are the subgraphs which appear in a number, more than or equal to a user-defined threshold. Many algorithms assume that the apriori based approach yields an efficient result for finding frequent subgraphs, but in our research, we found out that Apriori algorithm lacks scalability with the main memory. Frequent subgraph mining using Apriori algorithm with FS tree uses adjacency list representation. FS tree is a prefix tree data structure. It implements the algorithm in two phases. In the first phase, it uses the Apriori algorithm to find frequent two edge subgraphs. In the second phase, it uses FS-tree algorithm to search all the frequent subgraphs from frequent two edge subgraphs. Scanning the dataset for every candidate is the drawback of the Apriori algorithm, so the Apriori algorithm with FS-tree is used to overcome the multiple scanning. This algorithm is also implemented in an assumption that the data set fits well in memory. In this paper, we propose parallel map-reduce based frequent subgraph mining technique performed in a distributed environment on the Hadoop framework. The experiments validate the efficiency of the algorithm for generating frequent subgraphs in large graph datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barabási, A., Oltvai, Z.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004)
Article Google Scholar
Lacroix, V., Fernandes, C., Sagot, M.-F.: Motif search in graphs: pplication to metabolic networks. Trans. Comput. Biol. Bioinform. 3, 360–368 (2006)
Article Google Scholar
Borgelt, C., Berhold, M.R.: Mining molecular fragments: finding relevant substructures of molecules. In: Proceedings of International Conference on Data Mining 2002 (2002)
Google Scholar
Handcock, M., Raftery, A., Tantrum, J.: Model-based clustering for social networks. J. R. Stat. Soc. Ser. (Stat. Soc.) 170(2), 301–354 (2007)
Article MathSciNet Google Scholar
Kuramochi,M., Karypis, G.: Frequent subgraph discovery. In: ICDM01. FSM (2001)
Google Scholar
Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. Artif. Intell. Res. 1, 231–255 (1994). 3rd ed
Article Google Scholar
Praveena, A., Anitha, B., Rohini, R.: An efficient parallel iterative mapreduce based frequent subgraph mining algorithm. Middle-East J. Sci. Res. 24 (Tech. Algorithms Emerg. Technol.), 524–531 (2016)
Google Scholar
Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45372-5_2
Chapter Google Scholar
Vanetik, N., et al.: Computing frequent graph patterns from semi structured data. In: Proceedings 2002 IEEE International Conference on Data Mining, ICDM-2002 (2002)
Google Scholar
Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraph in the presence of isomorphism. UNC computer science Technique report TR03-021 (2003). FFSM
Google Scholar
Nguyen, S.N., Orlowska, M.E., Li, X.: Graph mining based on a data partitioning. In: Nineteenth Australasian Database Conference (ADC 2008) (2008)
Google Scholar
Bhuvaneswari, M., Rohini, R., Preetha, B.: A survey on privacy preserving public auditing for secure data storage. Int. J. Eng. Res. Technol. (2013)
Google Scholar
Huan, J., Wang, W., Prins, J., Yang, J.: Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 581–586 (2004)
Google Scholar
Hsieh, H.-P., Li, C.-T.: Mining temporal subgraph patterns in heterogeneous information networks. In: IEEE International Conference on Social Computing/IEEE International Conference on Privacy, Security, Risk and Trust (2010)
Google Scholar
Thomas, S., Nair, J.J.: Improvised Apriori with frequent subgraph tree for extracting frequent subgraphs. J. Intell. Fuzzy Syst. 32(4), 3209–3219 (2017)
Article Google Scholar
Yan, X., Han, J.: gSpan: graph based sustructure pattern mining. In: Proceedings of 2nd IEEE International Conference on Data Mining, ICDM 2002 (2002)
Google Scholar
Thomas, S., Nair, J.J.: A survey on extracting frequent subgraphs. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI-2016) (2016)
Google Scholar
Jeong, B.S., Choi, H.J., Hossain, M.A., Rashid, M.M., Karim, M.R.: A MapReduce framework for mining maximal contiguous frequent patterns in large DNA sequence datasets. IETE Tech. Rev. 29, 162–168 (2012)
Article Google Scholar
Hill, S., Srichandan, B., Sunderraman, R.: An iterative mapreduce approach to frequent subgraph mining in biological datasets. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (2012)
Google Scholar
Wu, B., Bai, Y.L.: An efficient distributed subgraph mining algorithm in extreme large graphs. In: Wang, F.L., Deng, H., Gao, Y., Lei, J. (eds.) AICI 2010, Part I. LNCS (LNAI), vol. 6319, pp. 107–115. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16530-6_14
Chapter Google Scholar
Gayathri, S., Radhika, N.: Greedy hop algorithm for detecting shortest path in vehicular networks. Int. J. Control. Theory Appl. 9, 1125–1133 (2016)
Google Scholar
Liu, Y., Jiang, X., Chen, H., Ma, J., Zhang, X.: MapReduce-based pattern finding algorithm applied in motif detection for prescription compatibility network. In: Dou, Y., Gruber, R., Joller, J.M. (eds.) APPT 2009. LNCS, vol. 5737, pp. 341–355. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03644-6_27
Chapter Google Scholar
Di Fatta, G., Berthold, M.: Dynamic load balancing for the distributed mining of molecular structures. IEEE Trans. Parallel Distrib. Syst. 17, 773–785 (2006)
Article Google Scholar
Lin, J., Dyer, C.: Data-intensive text processing with MapReduce (2010)
Google Scholar
Gayathri, R., Nair, J.J.: ex-FTCD: a novel mapreduce model for distributed multi source shortest path problem. J. Intell. Fuzzy Syst. 34(3), 16431652 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Amrita School of Engineering, Amritapuri, Amrita Vishwa Vidyapeetham, Coimbatore, India
Supriya Movva, Saketh Prata, Sai Sampath & R. G. Gayathri

Authors

Supriya Movva
View author publications
You can also search for this author in PubMed Google Scholar
Saketh Prata
View author publications
You can also search for this author in PubMed Google Scholar
Sai Sampath
View author publications
You can also search for this author in PubMed Google Scholar
R. G. Gayathri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saketh Prata .

Editor information

Editors and Affiliations

Amrita University, Bangalore, Karnataka, India
Navin Kumar
Delft University of Technology, Delft, The Netherlands
R. Venkatesha Prasad

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Movva, S., Prata, S., Sampath, S., Gayathri, R.G. (2019). Prefix Tree Based MapReduce Approach for Mining Frequent Subgraphs. In: Kumar, N., Venkatesha Prasad, R. (eds) Ubiquitous Communications and Network Computing. UBICNET 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 276. Springer, Cham. https://doi.org/10.1007/978-3-030-20615-4_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-20615-4_17
Published: 16 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20614-7
Online ISBN: 978-3-030-20615-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics