Abstract
Due to digitization of data in different fields, data are increasing in leaps and bounds. Mining of these large amounts of data requires two major issues to deal with. The first is the potential to deal with huge data which can be dealt with parallel algorithms as serial algorithms may take very long time or sometimes may not process. The second is the I/O overhead which can be dealt with memory mapping of files. This chapter brings together both parallelization and memory mapping of files concepts in mining the frequent itemsets. Our experiments proved that there is almost 20% more speedup on parallelizing our frequent itemset mining algorithm with memory mapping when compared to conventional I/O without memory mapping.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94), Santiago, Chile, vol. 1215, pp. 487–499 (1994)
Sivanandam, S.N., Sumathi, S.: Data Mining Concepts, Tasks. Techniques Thomson Business Information Pvt. Ltd., India (2006)
Radha, T.A., Lavanya, P.: Recent trends in parallel and distributed apriori algorithm. Int. J. Eng. Res. Appl. 1(4), 1820–1822 (2000)
Anuradha, T., Kranthi, M., Saragini, M.: Recent trends in parallel classification and clustering data mining. Glob. J. Comput. Appl. Technol. 1(4), 617–619 (2011)
Barbic, J.: Multi-core architectures—Lecture Notes [Online]. Available http://www.co-array.org/cafvsmpi.Htm (2007)
Akhter, S., Roberts, J.: Multi-core Programming, vol. 33. Intel Press (2006)
Packirisamy, V., Barathvajasankar, H.: Openmp in Multicore Architectures. University of Minnesota, Tech, Rep (2005)
Vu, Lan, Alaghband, Gita: Novel parallel method for association rule mining on multi-core shared memory systems. Parallel Comput. 40(10), 768–785 (2014)
Heidemann, J.: Performance interactions between P-HTTP and TCP implementations. ACM SIGCOMM Comput. Commun. Rev. 27(2), 65–73 (1997)
Tevanian, A., Rashid, R.F., Young, M., Golub, D.B., Thompson, M.R., Bolosky, W.J., Sanzi, R.: A UNIX interface for shared memory and memory mapped files under mach. In: USENIX Summer, pp. 53–68 (1987)
Love, R.: Linux System Programming, 2nd edn. O’Reilly Media, Inc (2007)
Anuradha, T., Satya Prasad, R., Tirumalarao, S.N.: Parallelizing apriori on dual core using OpenMP. Int. J. Comput. Appl. 43(24), 33–39 (2012)
Anuradha, T., Satya Prasad, R., Tirumalarao, S.N.: Performance evaluation of apriori on dual core with multiple threads. Int. J. Comput. Appl. 50(16), 9–16 (2012)
Anuradha, T., Satya Prasad, R., Tirumala Rao, S.N.: Performance evaluation of apriori with memory mapped files. Int. J. Comput. Sci. Issues 10 1(1), 162–169 (2013)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann (2006)
Bodon, F.: A fast apriori implementation. In: Goethals, B., Zaki, M.J. (Eds.) Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03), pp. 1–10 (2003)
Tirumala Rao, S.N., Prasad, E.V., Venkateswrlu, N.B.: A critical performance study of memory mapping on multi-core processors: an experiment with K-means algorithm with large data mining data sets. IJCA 1(9) (2010)
Jian, H., Badam, A., Qureshi, M.k., Schwan, K.: Unified address translation for memory-mapped SSDs with FlashMap. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, ACM, pp. 580–591 (2015)
Vahalia, U.: UNIX Internals The New Frontiers, Pearson education, New Delhi, 110 017, India (1996)
Venkateswarlu, N.B.: Advanced UNIX Programming. BS publications, Hyderabad (2005)
Mmap-memory mapped file support. [online] Accessed on 18 July 2015. Available https://docs.python.org/2/library/mmap.html (2015) (c) Python Software foundation
Krieger, O., Reid, K., Stumm, M.: Exploiting mapped files for parallel I/O. In: SPDP Workshop on Modeling and Specification of I/O, pp. 1–11 (1995)
Geurts, K., Wets, G., Brijs, T., Vanhoof, K.: Profiling high frequency accident locations using association rules. In: Electronic Proceedings of the 82th Annual Meeting of the Transportation Research Board, Washington, 12–16 January, USA, p. 18 (2003)
Anuradha, T., Satya Prasad, R.: Parallelizing apriori on hyper-threaded multi-core processor. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(6), 1072–1082 (2013)
Anuradha, T.: Performance of hyper-threading on memory mapped files. Int. J. Appl. Eng. Res. 9(23), 21421–21431 (2014)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Anuradha, T. (2018). Parallel Mining of Frequent Itemsets from Memory-Mapped Files. In: Aggarwal, V., Bhatnagar, V., Mishra, D. (eds) Big Data Analytics. Advances in Intelligent Systems and Computing, vol 654. Springer, Singapore. https://doi.org/10.1007/978-981-10-6620-7_43
Download citation
DOI: https://doi.org/10.1007/978-981-10-6620-7_43
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6619-1
Online ISBN: 978-981-10-6620-7
eBook Packages: EngineeringEngineering (R0)