Mining constrained inter-sequence patterns: a novel approach to cope with item constraints

  • Tuong Le
  • Anh Nguyen
  • Bao Huynh
  • Bay Vo
  • Witold Pedrycz
Article
  • 15 Downloads

Abstract

Data mining has become increasingly important in the Internet era. The problem of mining inter-sequence pattern is a sub-task in data mining with several algorithms in the recent years. However, these algorithms only focus on the transitional problem of mining frequent inter-sequence patterns and most frequent inter-sequence patterns are either redundant or insignificant. As such, it can confuse end users during decision-making and can require too much system resources. This led to the problem of mining inter-sequence patterns with item constraints, which addressed the problem when end-users only concerned the patterns contained a number of specific items. In this paper, we propose two novel algorithms for it. First is the ISP-IC (Inter-Sequence Pattern with Item Constraint mining) algorithm based on a theorem that quickly determines whether an inter-sequence pattern satisfies the constraints. Then, we propose a way to improve the strategy of ISP-IC, which is then applied to the \(i\)ISP-IC algorithm to enhance the performance of the process. Finally, pi ISP-IC, a parallel version of \(i\)ISP-IC, will be presented. Experimental results show that pi ISP-IC algorithm outperforms the post-processing of the-state-of-the-art method for mining inter-sequence patterns (EISP-Miner), ISP-IC, and \(i\)ISP-IC algorithms in most of the cases.

Keywords

Data mining Pattern mining Inter-sequence pattern mining Constraint mining Parallel mining 

Notes

Acknowledgments

This research is funded by Foundation for Science and Technology Development of Ton Duc Thang University (FOSTECT), website: http://fostect.tdt.edu.vn, under Grant FOSTECT.2015.BR.01.

References

  1. 1.
    Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the KDD’02, pp 429–435Google Scholar
  2. 2.
    Bucila C, Gehrke JE, Kifer D, White W (2003) Dualminer: A dual-pruning algorithm for itemsets with constraints. Data Min Knowl Discov 7(3):241–272MathSciNetCrossRefGoogle Scholar
  3. 3.
    Cao L, Zhang H, Zhao Y, Luo D, Zhang C (2011) Combined mining: discovering informative knowledge in complex data. IEEE Trans Syst, Man, Cybern Part B 41(3):699–712CrossRefGoogle Scholar
  4. 4.
    Duong H, Truong T, Vo B (2014) An efficient method for mining frequent itemsets with double constraints. Eng Appl Artif Intell 27:148–154CrossRefGoogle Scholar
  5. 5.
    Fournier-Viger P, Lin JC-W, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Sci Pattern Recogn (DSPR) 1(1):54–77Google Scholar
  6. 6.
    Gouda K, Hassaan M, Zaki MJ (2010) Prism: A primal-encoding approach for frequent sequence mining. J Comput Syst Sci 76(1):88–102CrossRefMATHGoogle Scholar
  7. 7.
    Kaneiwa K, Kudo Y (2011) A sequential pattern mining algorithm using rough set theory. Int J Approx Reason 52(6):881– 893CrossRefGoogle Scholar
  8. 8.
    Jeyabharathi J, Shanthi D (2016) Enhanced sequence identification technique for protein sequence database mining with hybrid frequent pattern mining algorithm. Int J Data Min Bioinforma 16(3):205–229CrossRefGoogle Scholar
  9. 9.
    Jung H, Chung K (2015) Sequential pattern profiling based bio-detection for smart health service. Clust Comput 18 (1):209– 219CrossRefGoogle Scholar
  10. 10.
    Le B, Tran MT, Vo B (2015) Mining frequent closed inter-sequence patterns efficiently using dynamic bit vectors. Appl Intell 43(1):74–84CrossRefGoogle Scholar
  11. 11.
    Lee AJT, Wang CS, Weng WY, Chen YA, Wu HW (2008) An efficient algorithm for mining closed inter-transaction itemsets. Data Knowl Eng 66(1):68–91CrossRefGoogle Scholar
  12. 12.
    Lee AJT, Wang CS (2007) An efficient algorithm for mining frequent inter-transaction patterns. Inf Sci 177(17):3453–3476CrossRefGoogle Scholar
  13. 13.
    Liao VCC, Chen MS (2014) DFSP: a Depth-First SPelling algorithm for sequential pattern mining of biological sequences. Knowl Inf Syst 38(3):623–639CrossRefGoogle Scholar
  14. 14.
    Lin CJ, Wu C, Chaovalitwongse WA (2015) Integrating human behavior modeling and data mining techniques to predict human errors in numerical typing. IEEE Trans Human-Mach Syst 45(1):39–50CrossRefGoogle Scholar
  15. 15.
    Lin WY, Huang KW, Wu CA (2010) MCFPTree: An FP-tree-based algorithm for multi constraint patterns discovery. Int J Bus Intell Data Min 5(3):231–246CrossRefGoogle Scholar
  16. 16.
    Lu H, Feng L, Han J (2000) Beyond intra-transaction association analysis: mining multi-dimensional inter-transaction association rules. ACM Trans Inf Syst 18(4):423–454CrossRefGoogle Scholar
  17. 17.
    Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained association rules. In: Proceedings of the SIGMOD’98, pp 13–24Google Scholar
  18. 18.
    Pham TT, Luo J, Hong TP, Vo B (2015) An efficient method for mining non-redundant sequential rules using attributed prefix-trees. Eng Appl Artif Intell 32:88–99CrossRefGoogle Scholar
  19. 19.
    Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu M-C (2004) Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRefGoogle Scholar
  20. 20.
    Saif-Ur-Rehman J, Habib A, Salam A (2016) Ashraf Top-K Miner: top-K identical frequent itemsets discovery without user support threshold. Knowl Inf Syst 48(3):741–762CrossRefGoogle Scholar
  21. 21.
    Salehi M, Kamalabadi IN, Ghoushchi MBG (2014) Personalized recommendation of learning material using sequential pattern mining and attribute based collaborative filtering. Educ Inf Technol 19(4):713–735CrossRefGoogle Scholar
  22. 22.
    Scalmato A, Sgorbissa A, Zaccaria R (2013) Describing and recognizing patterns of events in smart environments with description logic. IEEE Trans Cybern 43 (6):1882– 1897CrossRefGoogle Scholar
  23. 23.
    Tran MT, Le B, Vo B (2015) Combination of dynamic bit vectors and transaction information for mining frequent closed sequences efficiently. Eng Appl Artif Intell 38:183–189CrossRefGoogle Scholar
  24. 24.
    Tung A, Lu H, Han J, Feng L (2003) Efficient mining of Inter-transaction association rules. IEEE Trans Knowl Data Eng 15(1):43–56CrossRefGoogle Scholar
  25. 25.
    Vo B, Tran MT, Nguyen H, Hong TP, Le B (2012a) A dynamic bit-vector approach for efficiently mining inter-sequence patterns. In: Proceedings of the IBICA’12, pp 51–56Google Scholar
  26. 26.
    Vo B, Hong TP, Le B (2012) DBV-Miner: A dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Syst Appl 39(8):7196–7206CrossRefGoogle Scholar
  27. 27.
    Vo B, Pham S, Le T, Deng ZH (2017) A novel approach for mining maximal frequent patterns. Expert Syst Appl 73:178– 186CrossRefGoogle Scholar
  28. 28.
    Wang CS, Lee AJT (2009) Mining inter-sequence patterns. Expert Syst Appl 36(4):8649–8658CrossRefGoogle Scholar
  29. 29.
    Wang CS, Liu YH, Chu KC (2013) Closed inter-sequence pattern mining. J Syst Softw 86(6):1603–1612CrossRefGoogle Scholar
  30. 30.
    Wright AP, Wright AT, McCoy AB, Sittig DF (2015) The use of sequential pattern mining to predict next prescribed medications. J Biomed Inf 53:73–80CrossRefGoogle Scholar
  31. 31.
    Xue Y, Li T, Liu Z, Pang C, Li M, Liao Z, Hu X (2015) (In press). A new approach for the deep order preserving submatrix problem based on sequential pattern mining. International Journal of Machine Learning and Cybernetics.  https://doi.org/10.1007/s13042-015-0384-z
  32. 32.
    Yen SJ, Lee YS (2013) Mining non-redundant time-gap sequential patterns. Appl Intell 39(4):727–738MathSciNetCrossRefGoogle Scholar
  33. 33.
    Yun U, Pyun G, Yoon E (2015) Efficient mining of robust closed weighted sequential patterns without information loss. Int J Artif Intell Tools 24(1):1550007. [28 pages].  https://doi.org/10.1142/S0218213015500074 CrossRefGoogle Scholar
  34. 34.
    Yun U, Ryu K, Yoon E (2011) Weighted approximate sequential pattern mining within tolerance factors. Intell Data Anal 15(4):551–569Google Scholar
  35. 35.
    Yun U, Ryu K (2010) Discovering important sequential patterns with length-decreasing weighted support constraints. Int J Inf Technol Decis Making 9(4):575–599CrossRefMATHGoogle Scholar
  36. 36.
    Zhang S, Du Z, Wang JTL (2015) New techniques for mining frequent patterns in unordered trees. IEEE Trans Cybern 45(6):1113–1125CrossRefGoogle Scholar
  37. 37.
    Yun U, Kim D (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Fut Gener Comput Syst 68:346–360CrossRefGoogle Scholar
  38. 38.
    Ryang H, Yun U (2016) High utility pattern mining over data streams with sliding window technique. Expert Syst Appl 214-231:57Google Scholar
  39. 39.
    Kim D, Yun U (2016) Efficient mining of high utility pattern with considering of rarity and length. Appl Intell 45(1):152– 173CrossRefGoogle Scholar
  40. 40.
    Ryang H, Yun U, Ryu K (2016) Fast algorithm for high utility pattern mining with the sum of item quantities. Intell Data Anal 20(2):395–415CrossRefGoogle Scholar
  41. 41.
    Kieu T, Vo B, Le T, Deng ZH, Le B (2017) Mining top-k co-occurrence items with sequential pattern. Expert Syst Appl 85:123–133CrossRefGoogle Scholar
  42. 42.
    Zhang B, Lin JCW, Fournier-Viger P, Li T (2017) Mining of high utility-probability sequential patterns from uncertain databases. PLoS ONE 12(7):e0180931.  https://doi.org/10.1371/journal.pone.0180931  https://doi.org/10.1371/journal.pone.0180931 CrossRefGoogle Scholar
  43. 43.
    Lin JCW, Gan W, Hong TP, Chen HY, Li ST (2016) An efficient algorithm to maintain the discovered frequent sequences with record deletion. Intell Data Anal 20(3):655– 677CrossRefGoogle Scholar
  44. 44.
    Lin JCW, Gan W, Fournier-Viger P, Hong TP (2016) Efficiently updating the discovered sequential patterns for sequence modification. Int J Softw Eng Knowl Eng 26 (8):1285– 1314CrossRefGoogle Scholar
  45. 45.
    Zhang J, Wang Y, Yang D (2015) CCSpan: Mining closed contiguous sequential patterns. Knowl-Based Syst 89:1–13CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Tuong Le
    • 1
    • 2
  • Anh Nguyen
    • 3
  • Bao Huynh
    • 4
    • 5
  • Bay Vo
    • 6
  • Witold Pedrycz
    • 7
    • 8
    • 9
  1. 1.Division of Data ScienceTon Duc Thang UniversityHo Chi Minh CityVietnam
  2. 2.Faculty of Information TechnologyTon Duc Thang UniversityHo Chi Minh CityVietnam
  3. 3.Institute of Research and DevelopmentDuy Tan UniversityDa NangVietnam
  4. 4.Center for Applied Information TechnologyTon Duc Thang UniversityHo Chi Minh CityVietnam
  5. 5.Faculty of Electrical Engineering and Computer ScienceVŠB-Technical University of OstravaOstrava-PorubaCzech Republic
  6. 6.Faculty of Information TechnologyHo Chi Minh City University of TechnologyHo Chi Minh CityVietnam
  7. 7.Department of Electrical and Computer EngineeringUniversity of AlbertaEdmontonCanada
  8. 8.Department of Electrical and Computer Engineering, Faculty of EngineeringKing Abdulaziz UniversityJeddahSaudi Arabia
  9. 9.Systems Research InstitutePolish Academy of SciencesWarsawPoland

Personalised recommendations