Skip to main content
Log in

TKEH: an efficient algorithm for mining top-k high utility itemsets

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

High utility itemsets mining is a subfield of data mining with wide applications. Although the existing high utility itemsets mining algorithms can discover all the itemsets satisfying a given minimum utility threshold, it is often difficult for users to set a proper minimum utility threshold. A smaller minimum utility threshold value may produce a huge number of itemsets, whereas a higher one may produce a few itemsets. Specification of minimum utility threshold is difficult and time-consuming. To address these issues, top-k high utility itemsets mining has been defined where k is the number of high utility itemsets to be found. In this paper, we present an efficient algorithm (named TKEH) for finding top-k high utility itemsets. TKEH utilizes transaction merging and dataset projection techniques to reduce the dataset scanning cost. These techniques reduce the dataset when larger items are explored. TKEH employs three minimum utility threshold raising strategies. We utilize two strategies to prune search space efficiently. To calculate the utility of items and upper-bounds in linear time, TKEH utilizes array-based utility technique. We carried out some extensive experiments on real datasets. The results show that TKEH outperforms the state-of-the-art algorithms. Moreover, TKEH always performs better for dense datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, VLDB ’94. Morgan Kaufmann Publishers Inc, San Francisco, pp 487– 499

  2. Ahmed CF, Tanbeer SK, Jeong B-S, Choi H -J (2012) Interactive mining of high utility patterns over data streams. Expert Syst Appl 39(15):11979–11991

    Article  Google Scholar 

  3. Chu C-J, Tseng VS, Liang T (2008) An efficient algorithm for mining temporal high utility itemsets from data streams. J Syst Softw 81(7):1105–1117

    Article  Google Scholar 

  4. Dam T-L, Li K, Fournier-Viger P, Duong Q-H (2017) An efficient algorithm for mining top-k on-shelf high utility itemsets. Knowl Inf Syst 52(3):621–655

    Article  Google Scholar 

  5. Duong Q-H, Liao B, Fournier-Viger P, Dam T-L (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122

    Article  Google Scholar 

  6. Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C-W, Tseng VS (2014a) Spmf: a java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393

  7. Fournier-Viger P, Wu C-W, Zida S, Tseng VS (2014b) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. Springer International Publishing, Cham, pp 83–92

  8. Fournier-Viger P, Zida S (2015) Foshu: faster on-shelf high utility itemset mining – with or without negative unit profit. In: Proceedings of the 30th annual ACM symposium on applied computing, SAC ’15. ACM, New York, pp 857–864

  9. Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381

    Article  Google Scholar 

  10. Krishnamoorthy S (2017) Hminer: efficiently mining high utility itemsets. Expert Syst Appl 90(Supplement C):168–183

    Article  Google Scholar 

  11. Lee S, Park JS (2016) Top-k high utility itemset mining based on utility-list structures. In: 2016 International conference on big data and smart computing (BigComp), pp 101–108

  12. Li HF, Huang HY, Chen YC, Liu YJ, Lee SY (2008) Fast and memory efficient mining of high utility itemsets in data streams. In: 2008 Eighth IEEE International conference on data mining, pp 881–886

  13. Liu J, Wang K, Fung BCM (2012) Direct discovery of high utility itemsets without candidate generation. In: 2012 IEEE 12th International conference on data mining. Brussels, pp 984–989

  14. Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12. ACM, New York, pp 55–64

  15. Liu Y, Liao W -k, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD’05. Springer, Berlin, pp 689–695

  16. Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl-Based Syst 76:109–126

    Article  Google Scholar 

  17. Shie B-E, Hsiao H-F, Tseng VS, Yu PS (2011) Mining high utility mobile sequential patterns in mobile commerce environments. Springer, Berlin, pp 224–238

    Google Scholar 

  18. Shie B-E, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38(3):418–435

    Article  Google Scholar 

  19. Tseng VS, Shie B-E, Wu C-W, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans on Knowl and Data Eng 25(8):1772–1786

    Article  Google Scholar 

  20. Tseng VS, Wu CW, Fournier-Viger P, Yu PS (2016) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67

    Article  Google Scholar 

  21. Tseng VS, Wu C -W, Shie B -E, Yu PS (2010) Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, New York, pp 253–262

  22. Wu CW, Shie B-E, Tseng VS, Yu PS (2012) Mining top-k high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, New York, pp 78–86

  23. Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59 (3):603–626

    Article  Google Scholar 

  24. Yen S-J, Lee Y-S (2007) Mining high utility quantitative association rules. Springer, Berlin, pp 283–292

    Google Scholar 

  25. Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-k high utility sequential patterns. In: 2013 IEEE 13th International conference on data mining, pp 1259–1264

  26. Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878

    Article  Google Scholar 

  27. Zida S, Fournier-Viger P, Lin JC -W, Wu C -W, Tseng VS (2017) Efim: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625

    Article  Google Scholar 

  28. Zihayat M, An A (2014) Mining top-k high utility patterns over data streams. Inf Sci 285:138–161. Processing and Mining Complex Data Streams

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuldeep Singh.

Ethics declarations

The article uses threshold raising and memory reduction techniques to mine the top-k high utility itemsets mining.

Conflict of interests

The authors declare no conflicts of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Singh, K., Singh, S.S., Kumar, A. et al. TKEH: an efficient algorithm for mining top-k high utility itemsets. Appl Intell 49, 1078–1097 (2019). https://doi.org/10.1007/s10489-018-1316-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1316-x

Keywords

Navigation