Abstract
The massive amounts of data being generated in human’s world today may not be harnessed unless efficient and high-performance processing techniques are employed. As a result, continuous improvement in data mining algorithms and their efficient implementations is actively pursued by researchers. One of the widely applied big data mining tasks is the extraction of association rules from transactional datasets. ECLAT is an algorithm that can mine frequent itemsets as a basis for finding such rules. Since this algorithm operates on vertical representation of a dataset, its implementation may be significantly enhanced by employing sparse bitset compression. This paper studies the performance of four different bitset compression techniques proposed by researchers, using both real-world and synthetic big datasets. The effect of input data characteristics is analyzed for these compression methods in terms of energy consumption, performance, and memory usage behavior. Experimental results can guide the implementations to choose the proper compression method that best fits the problem requirements. The source code of this study is made available at https://github.com/fadishei/biteclat.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Laney, D.: 3D data management: controlling data volume, velocity and variety, vol. 6, no. 70. META Group Research Note (2001)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 29(2), 1–12 (2000)
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: KDD, vol. 97, pp. 283–286. ACM (1997)
Chen, Z., et al.: A survey of bitmap index compression algorithms for big data. Tsinghua Sci. Technol. 20(1), 100–115 (2015)
Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. Data Knowl. Eng. 69(1), 3–28 (2010)
Colantonio, A., Di Pietro, R.: CONCISE: compressed n-composable integer set. Inf. Process. Lett. 110(16), 644–650 (2010)
Lemire, D., et al.: Roaring bitmaps: implementation of an optimized software library. Softw. Pract. Exp. 48(4), 867–895 (2018)
Kuznetsov, A.: BitMagic library. https://github.com/tlk00/BitMagic. Accessed 13 Jan 2019
Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD, pp. 326–335. ACM (2003)
Mimaroglu, S., et al.: Mining frequent item sets efficiently by using compression techniques. In: Proceedings of the International Conference on Data Mining (DMIN) (2011)
Dwivedi, N., Satti, S.R.: Set and array based hybrid data structure solution for frequent pattern mining. In: 10th International Conference on Digital Information Management, pp. 14–29. IEEE (2015)
Quest Synthetic Data Generator. http://almaden.ibm.com/cs/quest/syndata.html. Accessed 13 Jan 2019
Hahnel, M., Dobel, B., Volp, M., Hartig, H.: Measuring energy consumption for short code paths using RAPL. ACM SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012)
Lemire, D.: EWAHBoolArray library. https://github.com/lemire/EWAHBoolArray. Accessed 13 Jan 2019
Roaring bitmaps. https://github.com/RoaringBitmap/CRoaring. Accessed 13 Jan 2019
Lemire, D.: CONCISE. https://github.com/lemire/Concise. Accessed 13 Jan 2019
Instacart Online Grocery Shopping. https://www.instacart.com/datasets/grocery-shopping-2017. Accessed 13 Jan 2019
Frequent Itemset Mining Dataset Repository. http://fimi.ua.ac.be/data/. Accessed 13 Jan 2019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Fadishei, H., Doustian, S., Saadati, P. (2019). The Merits of Bitset Compression Techniques for Mining Association Rules from Big Data. In: Grandinetti, L., Mirtaheri, S., Shahbazian, R. (eds) High-Performance Computing and Big Data Analysis. TopHPC 2019. Communications in Computer and Information Science, vol 891. Springer, Cham. https://doi.org/10.1007/978-3-030-33495-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-33495-6_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33494-9
Online ISBN: 978-3-030-33495-6
eBook Packages: Computer ScienceComputer Science (R0)