Abstract
Frequent itemset mining has been one of the most popular data mining techniques. Despite a large number of algorithms developed to implement this functionality, there is still room for improvement of their efficiency. In this paper, we focus on memory use in frequent itemset mining. We propose a new approach in which transactions are represented in a compact graph with the number of nodes equal to the number of distinct items in a database. Our experimental results confirm the efficiency of memory use without significantly sacrificing the execution time of the mining algorithm.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Edge labeling will be defined later.
References
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD Record, vol. 22, pp. 207–216. ACM (1993)
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, p. 55 (1998). http://www.ics.uci.edu/~mlearn/mlrepository.html
Buehrer, G., Parthasarathy, S., Tatikonda, S., Kurc, T., Saltz, J.: Toward terabyte pattern mining: an architecture-conscious solution. In: Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 2–12. ACM (2007)
Deng, Z.-H., Lv, S.-L.: Fast mining frequent itemsets using nodesets. Expert Syst. Appl. 41(10), 4505–4512 (2014)
Deng, Z.-H., Lv, S.-L.: Prepost+: an efficient n-lists-based algorithm for mining frequent itemsets via children-parent equivalence pruning. Expert Syst. Appl. 42(13), 5424–5432 (2015)
Deng, Z.H., Wang, Z.H., Jiang, J.J.: A new algorithm for fast mining frequent itemsets using N-lists. Sci. China Inf. Sci. 55(9), 2008–2030 (2012)
El-Hajj, M., Zaiane, O.R.: Parallel leap: large-scale maximal pattern mining in a distributed environment. In: 12th International Conference on Parallel and Distributed Systems, ICPADS 2006, vol. 1, pp. 8–pp. IEEE (2006)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In ACM SIGMOD Record, vol. 29, pp. 1–12. ACM (2000)
Kosala, R., Blockeel, H.: Web mining research: a survey. ACM SIGKDD Explor. Newsl. 2(1), 1–15 (2000)
Leung, C.K.-S., Khan, Q.I., Li, Z., Hoque, T.: CanTree: a canonical-order tree for incremental frequent-pattern mining. Knowl. Inf. Syst. 11(3), 287–311 (2007)
Li, Z., Zhou, Y.: PR-miner: automatically extracting implicit programming rules and detecting violations in large software code. In: ACM SIGSOFT Software Engineering Notes, vol. 30, pp. 306–315. ACM (2005)
Liu, G., Hongjun, L., Yu, J.X.: CFP-tree: a compact disk-based structure for storing and querying frequent itemsets. Inf. Syst. 32(2), 295–319 (2007)
Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D.: H-mine: hyper-structure mining of frequent patterns in large databases. In: Proceedings IEEE International Conference on Data Mining, ICDM 2001, pp. 441–448. IEEE (2001)
Shahbazi, N., Soltani, R., Gryz, J., An, A.: Building FP-tree on the fly: single-pass frequent itemset mining. Machine Learning and Data Mining in Pattern Recognition. LNCS (LNAI), vol. 9729, pp. 387–400. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41920-6_30
Wang, J.T.L., Zaki, M.J., Toivonen, H.T.T., Shasha, D.: Introduction to data mining in bioinformatics. In: Wu, X., et al. (eds.) Data Mining in Bioinformatics, pp. 3–8. Springer, London (2005). https://doi.org/10.1007/1-84628-059-1_1
Yan, X., Han, J., Afshar, R.: CloSpan: mining: closed sequential patterns in large datasets. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp. 166–177. SIAM (2003)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Shahbazi, N., Soltani, R., Gryz, J. (2018). Memory Efficient Frequent Itemset Mining. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2018. Lecture Notes in Computer Science(), vol 10935. Springer, Cham. https://doi.org/10.1007/978-3-319-96133-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-96133-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96132-3
Online ISBN: 978-3-319-96133-0
eBook Packages: Computer ScienceComputer Science (R0)