Advertisement

Parallel High Average-Utility Itemset Mining Using Better Search Space Division Approach

  • Krishan Kumar Sethi
  • Dharavath Ramesh
  • M. Sreenu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11319)

Abstract

Since the last decade, High Utility Itemset (HUI) mining has emerged as a popular pattern mining approach. HUI mining discovers a set of itemset with their profit more than a user defined profit threshold. High Average-Utility Itemset (HAUI) mining is an improvement over HUI mining that involves the length of items to refine the patterns and keep a fair mining process. In the era of big data, traditional HAUI mining algorithms are not suitable to process large transaction dataset on standalone system due to limitation of processing resources. Therefore, several distributed frameworks have been developed to process big data on cluster of commodity hardwares. This paper presents a parallel version of the traditional HAUI-Miner algorithm and names it as Parallel High-Average Utility Itemset Miner (PHAUIM). PHAUIM is a Spark-based distributed algorithm which splits the dataset into multiple chunks and distributes on cluster nodes to process each data chunk in parallel. In addition, an improved approach for search space division is developed. Proposed search space division technique fairly assigns the workload to each node and upgrades the performance. Comprehensive experiments have been performed to measure the performance of PHAUIM in terms of speedup and data scalability. PHAUIM is also compared with traditional HAUIM.

Keywords

High average-utility itemset mining Big data Apache-Spark Search space 

Notes

Acknowledgment

This research work is supported by Indian Institute of Technology (ISM), Dhanbad. The authors wish to express their gratitude and heartiest thanks to the Department of Computer Science & Engineering, Indian Institute of Technology (ISM), Dhanbad, India for providing their research support.

References

  1. 1.
    Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 22, 207–216 (1993)CrossRefGoogle Scholar
  2. 2.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)Google Scholar
  3. 3.
    Chan, R., Yang, Q., Shen, Y.-D.: Mining high utility itemsets. In: 2003 Third IEEE international conference on Data mining, ICDM 2003, pp. 19–26. IEEE (2003)Google Scholar
  4. 4.
    Chen, C.L.P., Zhang, C.-Y.: Data-intensive applications, challenges, techniques and technologies: a survey on big data. Inf. Sci. 275, 314–347 (2014)CrossRefGoogle Scholar
  5. 5.
    Chen, Y., An, A.: Approximate parallel high utility itemset mining. Big Data Res. 6, 26–42 (2016)CrossRefGoogle Scholar
  6. 6.
    Fournier-Viger, P., Gomariz, A., Gueniche, T., Soltani, A., Cheng-Wei, W., Tseng, V.S.: SPMF: a Java open-source pattern mining library. J. Mach. Learn. Res. 15(1), 3389–3393 (2014)zbMATHGoogle Scholar
  7. 7.
    Fournier-Viger, P., Lin, J.C.-W., Kiran, R.U., Koh, Y.S., Thomas, R.: A survey of sequential pattern mining. Data Sci. Pattern Recognit. 1(1), 54–77 (2017)Google Scholar
  8. 8.
    Fournier-Viger, P., Lin, J.C.-W., Vo, B., Chi, T.T., Zhang, J., Le, H.B.: A survey of itemset mining. Wiley Interdisc. Rev.: Data Mining Knowl. Discov. 7(4), e1207 (2017)Google Scholar
  9. 9.
    Fournier-Viger, P., Wu, C.-W., Zida, S., Tseng, V.S.: FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Andreasen, T., Christiansen, H., Cubero, J.-C., Raś, Z.W. (eds.) ISMIS 2014. LNCS (LNAI), vol. 8502, pp. 83–92. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-08326-1_9CrossRefGoogle Scholar
  10. 10.
    Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)zbMATHGoogle Scholar
  11. 11.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 29, 1–12 (2000)CrossRefGoogle Scholar
  12. 12.
    Hong, T.-P., Lee, C.-H., Wang, S.-L.: Effective utility mining with the measure of average utility. Expert Syst. Appl. 38(7), 8259–8265 (2011)CrossRefGoogle Scholar
  13. 13.
    Krishnamoorthy, S.: Pruning strategies for mining high utility itemsets. Expert Syst. Appl. 42(5), 2371–2381 (2015)CrossRefGoogle Scholar
  14. 14.
    Krishnamoorthy, S.: HMiner: efficiently mining high utility itemsets. Expert Syst. Appl. 90, 168–183 (2017)CrossRefGoogle Scholar
  15. 15.
    Lan, G.-C., Hong, T.-P., Tseng, V.S.: Efficiently mining high average-utility itemsets with an improved upper-bound strategy. Int. J. Inf. Technol. Decis. Making 11(05), 1009–1030 (2012)CrossRefGoogle Scholar
  16. 16.
    Lan, G.-C., Hong, T.-P., Tseng, V.S., et al.: A projection-based approach for discovering high average-utility itemsets. J. Inf. Sci. Eng. 28(1), 193–209 (2012)Google Scholar
  17. 17.
    Li, Y.-C., Yeh, J.-S., Chang, C.-C.: Isolated items discarding strategy for discovering high utility itemsets. Data Knowl. Eng. 64(1), 198–217 (2008)CrossRefGoogle Scholar
  18. 18.
    Lin, J.C.-W., Li, T., Fournier-Viger, P., Hong, T.-P., Zhan, J., Voznak, M.: An efficient algorithm to mine high average-utility itemsets. Adv. Eng. Inform. 30(2), 233–243 (2016)CrossRefGoogle Scholar
  19. 19.
    Lin, J.C.-W., Ren, S., Fournier-Viger, P.: MEMU: more efficient algorithm to mine high average-utility patterns with multiple minimum average-utility thresholds. IEEE Access 6, 7593–7609 (2018)CrossRefGoogle Scholar
  20. 20.
    Lin, Y.C., Wu, C.-W., Tseng, V.S.: Mining high utility itemsets in big data. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9078, pp. 649–661. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-18032-8_51CrossRefGoogle Scholar
  21. 21.
    Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 55–64. ACM (2012)Google Scholar
  22. 22.
    Liu, Y., Liao, W., Choudhary, A.: A fast high utility itemsets mining algorithm. In: Proceedings of the 1st International Workshop on Utility-Based Data Mining, pp. 90–99. ACM (2005)Google Scholar
  23. 23.
    Liu, Y., Liao, W., Choudhary, A.: A two-phase algorithm for fast discovery of high utility itemsets. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005).  https://doi.org/10.1007/11430919_79CrossRefGoogle Scholar
  24. 24.
    Sethi, K.K., Ramesh, D.: HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing. J. Supercomput. 73(8), 3652–3668 (2017)CrossRefGoogle Scholar
  25. 25.
    Sethi, K.K., Ramesh, D., Edla, D.R.: P-FHM+: parallel high utility itemset mining algorithm for big data processing. Procedia Comput. Sci. 132, 918–927 (2018). International Conference on Computational Intelligence and Data ScienceCrossRefGoogle Scholar
  26. 26.
    Tseng, V.S., Wu, C.-W., Shie, B.-E., Yu, P.S.: Up-growth: an efficient algorithm for high utility itemset mining. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 253–262. ACM (2010)Google Scholar
  27. 27.
    White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc., Newton (2012)Google Scholar
  28. 28.
    Yao, H., Hamilton, H.J., Butz, C.J.: A foundational approach to mining itemset utilities from databases. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 482–486. SIAM (2004)Google Scholar
  29. 29.
    Yun, U., Kim, D.: Mining of high average-utility itemsets using novel list structure and pruning strategy. Future Gener. Comput. Syst. 68, 346–360 (2017)CrossRefGoogle Scholar
  30. 30.
    Zaharia, M., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, pp. 2. USENIX Association (2012)Google Scholar
  31. 31.
    Zida, S., Fournier-Viger, P., Lin, J.C.-W., Wu, C.-W., Tseng, V.S.: EFIM: a highly efficient algorithm for high-utility itemset mining. In: Sidorov, G., Galicia-Haro, S.N. (eds.) MICAI 2015. LNCS (LNAI), vol. 9413, pp. 530–546. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-27060-9_44CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Indian Institute of Technology (ISM) DhanbadDhanbadIndia
  2. 2.Ashoka Institute of Engineering and TechnologyHyderabadIndia

Personalised recommendations