Mining Closed Colossal Frequent Patterns from High-Dimensional Dataset: Serial Versus Parallel Framework

Sureshan, Sudeep; Penumacha, Anusha; Jain, Siddharth; Vanahalli, Manjunath; Patil, Nagamma

doi:10.1007/978-981-10-3373-5_32

Sudeep Sureshan¹⁹,
Anusha Penumacha¹⁹,
Siddharth Jain¹⁹,
Manjunath Vanahalli¹⁹ &
…
Nagamma Patil¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 518))

1059 Accesses
1 Citations

Abstract

Mining colossal patterns is one of the budding fields with a lot of applications, especially in the field of bioinformatics and genetics. Gene sequences contain inherent information. Mining colossal patterns in such sequences can further help in their study and improve prediction accuracy. The increase in average transaction length reduces the efficiency and effectiveness of existing closed frequent pattern mining algorithm. The traditional algorithms expend most of the running time in mining huge amount of minute and midsize patterns which do not enclose valuable information. The recent research focused on mining large cardinality patterns called as colossal patterns which possess valuable information. A novel parallel algorithm has been proposed to extract the closed colossal frequent patterns from high-dimensional datasets. The algorithm has been implemented on Hadoop framework to exploit its inherent distributed parallelism using MapReduce programming model. The experiment results highlight that the proposed parallel algorithm on Hadoop framework gives an efficient performance in terms of execution time compared to the existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chang, Rui, and Zhiyi Liu. “An improved apriori algorithm.” Electronics and Optoelectronics (ICEOE), 2011 International Conference on. Vol. 1. IEEE, 2011.
Google Scholar
Han, Jiawei, Jian Pei, and Yiwen Yin. “Mining frequent patterns without candidate generation.” ACM Sigmod Record. Vol. 29. No. 2. ACM, 2000.
Google Scholar
Pasquier, Nicolas, et al. “Discovering frequent closed itemsets for association rules.” Database Theory ICDT99. Springer Berlin Heidelberg, 1999. 398–416.
Google Scholar
Pei, Jian, Jiawei Han, and Runying Mao. “CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets.” ACM SIGMOD workshop on research issues in data mining and knowledge discovery. Vol. 4. No. 2. 2000.
Google Scholar
Zaki, Mohammed J., and Ching-Jui Hsiao. “Efficient algorithms for mining closed itemsets and their lattice structure.” Knowledge and Data Engineering, IEEE Transactions on 17.4 (2005): 462–478.
Google Scholar
Pan, Feng, et al. “Carpenter: Finding closed patterns in long biological datasets.” Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003.
Google Scholar
Zhu, Feida, et al. “Mining colossal frequent patterns by core pattern fusion.” Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on. IEEE, 2007.
Google Scholar
Sohrabi, Mohammad Karim, and Ahmad Abdollahzadeh Barforoush. “Efficient colossal pattern mining in high dimensional datasets.” Knowledge-Based Systems 33 (2012): 41–52.
Google Scholar
Zulkurnain, Nurul F., David J. Haglin, and John A. Keane. “DisClose: discovering colossal closed itemsets via a memory efficient compact row-tree.” Emerging Trends in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2012. 141–156.
Google Scholar
The Data Mining & Research Blog,. “An Introduction To Frequent Pattern Mining - The Data Mining & Research Blog”. N.p., 2013. Web. 6 Feb. 2016. http://data-mining.philippe-fournier-viger.com/introduction-frequent-pattern-mining/.
Howto.commetrics.com,. “How Raw Data Are Normalized Howto.Commetrics”. N.p., 2016. Web. 7 Feb. 2016. http://howto.commetrics.com/methodology/statistics/normalization/.
Normalization, Data. “Data Mining Blog: Data Preprocessing Normalization”. Intelligencemining.blogspot.in. N.p., 2009. Web. 7 Feb. 2016. http://intelligencemining.blogspot.in/2009/07/data-preprocessing-normalization.html.
Prekopcsk, Zoltn, et al. “Radoop: Analyzing big data with rapidminer and hadoop.” Proceedings of the 2nd RapidMiner community meeting and conference (RCOMM 2011). 2011.
Google Scholar
Itkar, Suhasini A., and Uday V. Kulkarni. “Distributed Algorithm for Frequent Pattern Mining using HadoopMap Reduce Framework.” (2013).
Google Scholar
Golub, Todd R., et al. “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.” Science 286.5439 (1999): 531–537.
Google Scholar
Gordon, Gavin J., et al. “Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma.” Cancer research 62.17 (2002): 4963–4967.
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Technology Karnataka, Surathkal, Mangalore, 575025, Karnataka, India
Sudeep Sureshan, Anusha Penumacha, Siddharth Jain, Manjunath Vanahalli & Nagamma Patil

Authors

Sudeep Sureshan
View author publications
You can also search for this author in PubMed Google Scholar
Anusha Penumacha
View author publications
You can also search for this author in PubMed Google Scholar
Siddharth Jain
View author publications
You can also search for this author in PubMed Google Scholar
Manjunath Vanahalli
View author publications
You can also search for this author in PubMed Google Scholar
Nagamma Patil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sudeep Sureshan .

Editor information

Editors and Affiliations

National Institute of Technology, Dept. of Computer Science & Engineering National Institute of Technology, Rourkela, Odisha, India
Pankaj Kumar Sa
National Institute of Technology, Dept. of Computer Science & Engineering National Institute of Technology, Rourkela, Odisha, India
Manmath Narayan Sahoo
Universiti Malaysia Perlis (UniMAP), School of Mecahtronics Engineering Universiti Malaysia Perlis (UniMAP), Arau, Perlis, Malaysia
M. Murugappan
The University of Exeter, Lecturer The University of Exeter, Exeter, Devon, United Kingdom
Yulei Wu
National Institute of Technology, Dept. of Computer Science & Engineering National Institute of Technology, Rourkela, Odisha, India
Banshidhar Majhi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sureshan, S., Penumacha, A., Jain, S., Vanahalli, M., Patil, N. (2018). Mining Closed Colossal Frequent Patterns from High-Dimensional Dataset: Serial Versus Parallel Framework. In: Sa, P., Sahoo, M., Murugappan, M., Wu, Y., Majhi, B. (eds) Progress in Intelligent Computing Techniques: Theory, Practice, and Applications. Advances in Intelligent Systems and Computing, vol 518. Springer, Singapore. https://doi.org/10.1007/978-981-10-3373-5_32

Download citation

DOI: https://doi.org/10.1007/978-981-10-3373-5_32
Published: 13 July 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3372-8
Online ISBN: 978-981-10-3373-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics