Abstract
Data mining is an essential step of knowledge discovery in databases (KDD) process by analyzing the huge amount of data from different perspectives and summarizing it into potentially valuable, valid, novel, interesting, and previously unknown information. Due to the importance of extracting knowledge from the massive data repositories, data mining is an essential components in various fields. Association rule mining (ARM), is one of the most important and well researched techniques of data mining, It aims to extract essential relationships, frequent patterns, associations among itemsets in the transaction databases or other data repositories. Many algorithm have been proposed to find the frequent itemset efficiently. In this research, we have chosen four well established frequent itemset mining methods which are Apriori, Apriori TID, Eclat, and FP-Growth to analyze their performance on cloud environment. Cloud computing is a new paradigm to analyze big data efficiently and cost effectively. In this study we analyzed the algorithms on Amazon web service (AWS) platform using elastic cloud computing (EC2) service. We thereafter compare the four algorithms based on their execution time by varying the minimum support (min_sup) values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tan, P.: Introduction to Data Mining, vol. 1. Pearson Addison Wesley, Boston (2007)
Hand, D.J.: Principles of Data Mining, vol. 30, no. 7. MIT press, Cambridge (2007)
Ngai, E.W.T., Xiu, L., Chau, D.C.K.: Application of data mining techniques in customer relationship management: a literature review and classification. Expert Syst. Appl. 36(2 PART 2), 2592–2602 (2009)
Shaw, M.J.B.C., Subramaniam, C., Tan, G.W., Welge, M.E.: Knowledge management and data mining for marketing. Decis. Support Syst. 31(1), 127–137 (2001)
Obenshain, M.K.: Application of data mining techniques to healthcare data. Infect. Control Hosp. Epidemiol. 25(8), 690–695 (2004)
Antonie, M., Coman, A., Zaiane, O.R.: Application of data mining techniques for medical image classification. In: Proceedings of the Second International Workshop on Multimedia Data Mining (MDM/KDD 2001), pp. 94–101 (2001)
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: discovery and applications of usage patterns from web data. ACM SIGKDD 1(2), 12–23 (2000)
Han, J., Kamber, M.: Data Mining, Southeast Asia Edition: Concepts and Techniques. Morgan Kaufmann, Los Altos (2006)
Hipp, J., Güntzer, U., Nakhaeizadeh, G.: Algorithms for association rule mining - a general survey and comparison. ACM SIGKDD Explor. Newsl. 2(1), 58–64 (2000)
Zhang, C., Zhang, S.: Association Rule Mining: Models and Algorithms, vol. 2307. Springer, Berlin (2002)
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22, 207–216 (1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data bases, VLDB (1994)
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Los Altos (2005)
Ambulkar, B., Borkar, V.: Data mining in cloud computing. In: MPGI National Multi Conference, pp. 23–26 (2012)
Petre, R.S.: Data mining in cloud computing. Datab. Syst. J. 3(3), 67–71 (2012)
Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and data mining - KDD 2003, p. 326 (2003)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)
Borgelt, C.: Keeping things simple: finding frequent item sets by recursive elimination. In: Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, pp. 66–70 (2005)
Deng, Z.-H., Lv, S.-L.S.: Fast mining frequent itemsets using nodesets. Expert Syst. Appl. 41(10), 4505–4512 (2014)
Krishna, T.: Effectiveness of various FPM algorithms in data mining. ijcsit.org 02(01), 01–05 (2014)
Patel Tushar, S., Mayur, P., Dhara, L., Jahnvi, K., Piyusha, D., Ashish, P., Reecha, P., Tushar, S.P., Mayur, P., Dhara, L.: An analytical study of various frequent itemset mining algorithms. Res. J. Comput. Inf. Technol. Sci. 1(1), 2–5 (2013)
Pramod, S., Vyas, O.P.: Survey on frequent itemset mining algorithms. Int. J. Comput. Appl. 1(5), 1–6 (2010)
Prithiviraj, P., Porkodi, R.: A comparative analysis of association rule mining algorithms in data mining: a study. Open J. Comput. Sci. Eng. Surv. 3(1), 98–119 (2015)
Tiwari, M., Jha, M.B., Yadav, O.: Performance analysis of data mining algorithms in Weka. IOSR J. Comput. Eng. ISSN 6, 661–2278 (2012)
Trivedi, M.M.: Review and analysis of various efficient frequent pattern algorithms. Int. J. Technol. Res. Eng. 2(2), 139–143 (2014)
Garg, K., Kumar, D.: Comparing the performance of frequent pattern mining algorithms. Int. J. Comput. Appl. 69(25), 21–28 (2013)
Sinha, G., Ghosh, S.M.: Identification of best algorithm in association rule mining based on performance. Int. J. Comput. Sci. Mob. Comput. 3(11), 38–45 (2014)
Nichol, M.B., Knight, T.K., Dow, T., Wygant, G., Borok, G., Hauch, O., O’Connor, R.: Quality of anticoagulation monitoring in nonvalvular atrial fibrillation patients: comparison of anticoagulation clinic versus usual care. Ann. Pharmacother. 42(1), 62–70 (2008)
Yu, L.C., Chan, C.L., Lin, C.C., Lin, I.C.: Mining association language patterns using a distributional semantic model for negative life event classification. J. Biomed. Inform. 44(4), 509–518 (2011)
Zhao, Q., Bhowmick, S.S.: Association Rule Mining: a Survey. Nanyang Technological University, Singapore (2003)
Said, A.M., Dominic, P.D.D., Abdullah, A.B.: A comparative study of fp-growth variations. Int. J. Comput. Sci. Netw. Secur. 9(5), 266–272 (2009)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 29(2), 1–12 (2000)
Zaiane, O.R., El-Hajj, M., Lu, P.: Fast parallel association rule mining without candidacy generation. In: Proceedings 2001 IEEE International Conference on Data Mining, pp. 665–668 (2001)
Borgelt, C., Borgelt, C., Kruse, R., Kruse, R.: Induction of association rules: apriori implementation. In: 15th Conference on Computational Statistics Physica Verlag, Heidelberg, Germany 2002, vol. 1, pp. 1–6 (2002)
Amazon, A.W.S., Miller, F.P., Vandome, A.F., McBrewster, J.: Amazon web services, vol. 12, pp. 1–3 (November 2012). http://aws.Amaz.com/es/ec2/
Murty, J.: Programming Amazon Web Services: S3, EC2, SQS, FPS, and SimpleDB. O’Reilly Media Inc, Sebastopol (2008)
Robinson, D.: Amazon Web Services Made Simple: Learn how Amazon EC2, S3, SimpleDB and SQS Web Services Enables You to Reach Business Goals Faster. Emereo Pty Ltd, Brisbane (2008)
Goethals, B.: Frequent itemset mining implementations repository (2003). http://fimi.ua.ac.be/
Fournier-Viger, P.: SPMF- an open-source data mining library (2003). http://www.philippe-fournier-viger.com/spmf/
Acknowledgment
We wish to thank Universiti Kebangsaan Malaysia (UKM) and Ministry of Higher Education Malaysia for supporting this work by research Grants (ERGS/1/2013/ICT07/UKM/02/3).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Saabith, A.L.S., Sundararajan, E., Bakar, A.A. (2015). Comparative Analysis of Different Versions of Association Rule Mining Algorithm on AWS-EC2. In: Badioze Zaman, H., et al. Advances in Visual Informatics. IVIC 2015. Lecture Notes in Computer Science(), vol 9429. Springer, Cham. https://doi.org/10.1007/978-3-319-25939-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-25939-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25938-3
Online ISBN: 978-3-319-25939-0
eBook Packages: Computer ScienceComputer Science (R0)