Advertisement

Applied Intelligence

, Volume 49, Issue 3, pp 1078–1097 | Cite as

TKEH: an efficient algorithm for mining top-k high utility itemsets

  • Kuldeep SinghEmail author
  • Shashank Sheshar Singh
  • Ajay Kumar
  • Bhaskar Biswas
Article
  • 56 Downloads

Abstract

High utility itemsets mining is a subfield of data mining with wide applications. Although the existing high utility itemsets mining algorithms can discover all the itemsets satisfying a given minimum utility threshold, it is often difficult for users to set a proper minimum utility threshold. A smaller minimum utility threshold value may produce a huge number of itemsets, whereas a higher one may produce a few itemsets. Specification of minimum utility threshold is difficult and time-consuming. To address these issues, top-k high utility itemsets mining has been defined where k is the number of high utility itemsets to be found. In this paper, we present an efficient algorithm (named TKEH) for finding top-k high utility itemsets. TKEH utilizes transaction merging and dataset projection techniques to reduce the dataset scanning cost. These techniques reduce the dataset when larger items are explored. TKEH employs three minimum utility threshold raising strategies. We utilize two strategies to prune search space efficiently. To calculate the utility of items and upper-bounds in linear time, TKEH utilizes array-based utility technique. We carried out some extensive experiments on real datasets. The results show that TKEH outperforms the state-of-the-art algorithms. Moreover, TKEH always performs better for dense datasets.

Keywords

High utility itemsets Utility mining Itemset mining Top-k itemset mining Threshold raising strategies 

Notes

Compliance with Ethical Standards

The article uses threshold raising and memory reduction techniques to mine the top-k high utility itemsets mining.

Conflict of interests

The authors declare no conflicts of interest.

References

  1. 1.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, VLDB ’94. Morgan Kaufmann Publishers Inc, San Francisco, pp 487– 499Google Scholar
  2. 2.
    Ahmed CF, Tanbeer SK, Jeong B-S, Choi H -J (2012) Interactive mining of high utility patterns over data streams. Expert Syst Appl 39(15):11979–11991CrossRefGoogle Scholar
  3. 3.
    Chu C-J, Tseng VS, Liang T (2008) An efficient algorithm for mining temporal high utility itemsets from data streams. J Syst Softw 81(7):1105–1117CrossRefGoogle Scholar
  4. 4.
    Dam T-L, Li K, Fournier-Viger P, Duong Q-H (2017) An efficient algorithm for mining top-k on-shelf high utility itemsets. Knowl Inf Syst 52(3):621–655CrossRefGoogle Scholar
  5. 5.
    Duong Q-H, Liao B, Fournier-Viger P, Dam T-L (2016) An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies. Knowl-Based Syst 104:106–122CrossRefGoogle Scholar
  6. 6.
    Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C-W, Tseng VS (2014a) Spmf: a java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393Google Scholar
  7. 7.
    Fournier-Viger P, Wu C-W, Zida S, Tseng VS (2014b) FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. Springer International Publishing, Cham, pp 83–92Google Scholar
  8. 8.
    Fournier-Viger P, Zida S (2015) Foshu: faster on-shelf high utility itemset mining – with or without negative unit profit. In: Proceedings of the 30th annual ACM symposium on applied computing, SAC ’15. ACM, New York, pp 857–864Google Scholar
  9. 9.
    Krishnamoorthy S (2015) Pruning strategies for mining high utility itemsets. Expert Syst Appl 42(5):2371–2381CrossRefGoogle Scholar
  10. 10.
    Krishnamoorthy S (2017) Hminer: efficiently mining high utility itemsets. Expert Syst Appl 90(Supplement C):168–183CrossRefGoogle Scholar
  11. 11.
    Lee S, Park JS (2016) Top-k high utility itemset mining based on utility-list structures. In: 2016 International conference on big data and smart computing (BigComp), pp 101–108Google Scholar
  12. 12.
    Li HF, Huang HY, Chen YC, Liu YJ, Lee SY (2008) Fast and memory efficient mining of high utility itemsets in data streams. In: 2008 Eighth IEEE International conference on data mining, pp 881–886Google Scholar
  13. 13.
    Liu J, Wang K, Fung BCM (2012) Direct discovery of high utility itemsets without candidate generation. In: 2012 IEEE 12th International conference on data mining. Brussels, pp 984–989Google Scholar
  14. 14.
    Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12. ACM, New York, pp 55–64Google Scholar
  15. 15.
    Liu Y, Liao W -k, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Proceedings of the 9th Pacific-Asia conference on advances in knowledge discovery and data mining, PAKDD’05. Springer, Berlin, pp 689–695Google Scholar
  16. 16.
    Ryang H, Yun U (2015) Top-k high utility pattern mining with effective threshold raising strategies. Knowl-Based Syst 76:109–126CrossRefGoogle Scholar
  17. 17.
    Shie B-E, Hsiao H-F, Tseng VS, Yu PS (2011) Mining high utility mobile sequential patterns in mobile commerce environments. Springer, Berlin, pp 224–238Google Scholar
  18. 18.
    Shie B-E, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38(3):418–435CrossRefGoogle Scholar
  19. 19.
    Tseng VS, Shie B-E, Wu C-W, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans on Knowl and Data Eng 25(8):1772–1786CrossRefGoogle Scholar
  20. 20.
    Tseng VS, Wu CW, Fournier-Viger P, Yu PS (2016) Efficient algorithms for mining top-k high utility itemsets. IEEE Trans Knowl Data Eng 28(1):54–67CrossRefGoogle Scholar
  21. 21.
    Tseng VS, Wu C -W, Shie B -E, Yu PS (2010) Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’10. ACM, New York, pp 253–262Google Scholar
  22. 22.
    Wu CW, Shie B-E, Tseng VS, Yu PS (2012) Mining top-k high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12. ACM, New York, pp 78–86Google Scholar
  23. 23.
    Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59 (3):603–626CrossRefGoogle Scholar
  24. 24.
    Yen S-J, Lee Y-S (2007) Mining high utility quantitative association rules. Springer, Berlin, pp 283–292Google Scholar
  25. 25.
    Yin J, Zheng Z, Cao L, Song Y, Wei W (2013) Efficiently mining top-k high utility sequential patterns. In: 2013 IEEE 13th International conference on data mining, pp 1259–1264Google Scholar
  26. 26.
    Yun U, Ryang H, Ryu KH (2014) High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Syst Appl 41(8):3861–3878CrossRefGoogle Scholar
  27. 27.
    Zida S, Fournier-Viger P, Lin JC -W, Wu C -W, Tseng VS (2017) Efim: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51(2):595–625CrossRefGoogle Scholar
  28. 28.
    Zihayat M, An A (2014) Mining top-k high utility patterns over data streams. Inf Sci 285:138–161. Processing and Mining Complex Data StreamsMathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Kuldeep Singh
    • 1
    Email author
  • Shashank Sheshar Singh
    • 1
  • Ajay Kumar
    • 1
  • Bhaskar Biswas
    • 1
  1. 1.Department of Computer Science and EngineeringIndian Institute of Technology (BHU)VaranasiIndia

Personalised recommendations