Skip to main content

The Merits of Bitset Compression Techniques for Mining Association Rules from Big Data

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 891))

Abstract

The massive amounts of data being generated in human’s world today may not be harnessed unless efficient and high-performance processing techniques are employed. As a result, continuous improvement in data mining algorithms and their efficient implementations is actively pursued by researchers. One of the widely applied big data mining tasks is the extraction of association rules from transactional datasets. ECLAT is an algorithm that can mine frequent itemsets as a basis for finding such rules. Since this algorithm operates on vertical representation of a dataset, its implementation may be significantly enhanced by employing sparse bitset compression. This paper studies the performance of four different bitset compression techniques proposed by researchers, using both real-world and synthetic big datasets. The effect of input data characteristics is analyzed for these compression methods in terms of energy consumption, performance, and memory usage behavior. Experimental results can guide the implementations to choose the proper compression method that best fits the problem requirements. The source code of this study is made available at https://github.com/fadishei/biteclat.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Laney, D.: 3D data management: controlling data volume, velocity and variety, vol. 6, no. 70. META Group Research Note (2001)

    Google Scholar 

  2. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22(2), 207–216 (1993)

    Article  Google Scholar 

  3. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  4. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 29(2), 1–12 (2000)

    Article  Google Scholar 

  5. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: KDD, vol. 97, pp. 283–286. ACM (1997)

    Google Scholar 

  6. Chen, Z., et al.: A survey of bitmap index compression algorithms for big data. Tsinghua Sci. Technol. 20(1), 100–115 (2015)

    Article  MathSciNet  Google Scholar 

  7. Lemire, D., Kaser, O., Aouiche, K.: Sorting improves word-aligned bitmap indexes. Data Knowl. Eng. 69(1), 3–28 (2010)

    Article  Google Scholar 

  8. Colantonio, A., Di Pietro, R.: CONCISE: compressed n-composable integer set. Inf. Process. Lett. 110(16), 644–650 (2010)

    Article  Google Scholar 

  9. Lemire, D., et al.: Roaring bitmaps: implementation of an optimized software library. Softw. Pract. Exp. 48(4), 867–895 (2018)

    Article  Google Scholar 

  10. Kuznetsov, A.: BitMagic library. https://github.com/tlk00/BitMagic. Accessed 13 Jan 2019

  11. Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proceedings of the Ninth ACM SIGKDD, pp. 326–335. ACM (2003)

    Google Scholar 

  12. Mimaroglu, S., et al.: Mining frequent item sets efficiently by using compression techniques. In: Proceedings of the International Conference on Data Mining (DMIN) (2011)

    Google Scholar 

  13. Dwivedi, N., Satti, S.R.: Set and array based hybrid data structure solution for frequent pattern mining. In: 10th International Conference on Digital Information Management, pp. 14–29. IEEE (2015)

    Google Scholar 

  14. Quest Synthetic Data Generator. http://almaden.ibm.com/cs/quest/syndata.html. Accessed 13 Jan 2019

  15. Hahnel, M., Dobel, B., Volp, M., Hartig, H.: Measuring energy consumption for short code paths using RAPL. ACM SIGMETRICS Perform. Eval. Rev. 40(3), 13–17 (2012)

    Article  Google Scholar 

  16. Lemire, D.: EWAHBoolArray library. https://github.com/lemire/EWAHBoolArray. Accessed 13 Jan 2019

  17. Roaring bitmaps. https://github.com/RoaringBitmap/CRoaring. Accessed 13 Jan 2019

  18. Lemire, D.: CONCISE. https://github.com/lemire/Concise. Accessed 13 Jan 2019

  19. Instacart Online Grocery Shopping. https://www.instacart.com/datasets/grocery-shopping-2017. Accessed 13 Jan 2019

  20. Frequent Itemset Mining Dataset Repository. http://fimi.ua.ac.be/data/. Accessed 13 Jan 2019

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hamid Fadishei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fadishei, H., Doustian, S., Saadati, P. (2019). The Merits of Bitset Compression Techniques for Mining Association Rules from Big Data. In: Grandinetti, L., Mirtaheri, S., Shahbazian, R. (eds) High-Performance Computing and Big Data Analysis. TopHPC 2019. Communications in Computer and Information Science, vol 891. Springer, Cham. https://doi.org/10.1007/978-3-030-33495-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-33495-6_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-33494-9

  • Online ISBN: 978-3-030-33495-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics