Skip to main content

Computing Theoretically-Sound Upper Bounds to Expected Support for Frequent Pattern Mining Problems over Uncertain Big Data

  • Conference paper
  • First Online:
Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2016)

Abstract

Frequent pattern mining aims to discover implicit, previously unknown, and potentially useful knowledge in the form of sets of frequently co-occurring items, events, or objects. To mine frequent patterns from probabilistic datasets of uncertain data, where each item in a transaction is usually associated with an existential probability expressing the likelihood of its presence in that transaction, the UF-growth algorithm captures important information about uncertain data in a UF-tree structure so that expected support can be computed for each pattern. A pattern is considered frequent if its expected support meets or exceeds the user-specified threshold. However, a challenge is that the UF-tree can be large. To handle this challenge, several algorithms use smaller trees such that upper bounds to expected support can be computed. In this paper, we examine these upper bounds, and determine which ones provide tighter upper bounds to expected support for frequent pattern mining of uncertain big data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, C.C., Li, Y., Wang, J., Wang, J.: Frequent pattern mining with uncertain data. In: ACM KDD 2009, pp. 29–37 (2009)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 1994, pp. 487–499 (1994)

    Google Scholar 

  3. Ahmed, A.U., Ahmed, C.F., Samiullah, M., Adnan, N., Leung, C.K.: Mining interesting patterns from uncertain databases. Inf. Sci. 354, 60–85 (2016)

    Article  Google Scholar 

  4. Aryadinata, Y.S., Lin, Y., Barcellos, C., Laurent, A., Libourel, T.: Mining epidemiological dengue fever data from Brazil: a gradual pattern based geographical information system. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014, Part II. CCIS, vol. 443, pp. 414–423. Springer, Heidelberg (2014)

    Google Scholar 

  5. Calders, T., Garboni, C., Goethals, B.: Efficient pattern mining of uncertain data with sampling. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010, Part I. LNCS (LNAI), vol. 6118, pp. 480–487. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Chen, L., Liu, C., Zhang, C.: Mining probabilistic representative frequent patterns from uncertain data. In: SIAM SDM 2013, pp. 73–81 (2013)

    Google Scholar 

  7. Cuzzocrea, A.: Analytics over big data: exploring the convergence of data warehousing, OLAP and data-intensive cloud infrastructures. In: IEEE COMPSAC 2013, pp. 481–483 (2013)

    Google Scholar 

  8. Cuzzocrea, A.: Approximate OLAP query processing over uncertain and imprecise multidimensional data streams. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds.) DEXA 2013, Part II. LNCS, vol. 8056, pp. 156–173. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  9. Cuzzocrea, A.: Retrieving accurate estimates to OLAP queries over uncertain and imprecise multidimensional data streams. In: Cushing, J.B., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 575–576. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Cuzzocrea, A., Bellatreche, L., Song, I.-Y.: Data warehousing and OLAP over big data: current challenges and future research directions. In: ACM DOLAP 2013, pp. 67–70 (2013)

    Google Scholar 

  11. Cuzzocrea, A., Darmont, J., Mahboubi, H.: Fragmenting very large XML data warehouses via K-means clustering algorithm. Int. J. Bus. Intell. Data Min. 4(3/4), 301–328 (2009)

    Article  Google Scholar 

  12. Cuzzocrea, A., Leung, C.K.: Upper bounds to expected support for frequent itemset mining of uncertain big data. In: ACM SAC 2015, pp. 919–921 (2015)

    Google Scholar 

  13. Cuzzocrea, A., Leung, C.K., MacKinnon, R.K.: Mining constrained frequent itemsets from distributed uncertain data. Future Gener. Comput. Syst. 37, 117–126 (2014)

    Article  Google Scholar 

  14. Cuzzocrea, A., Saccà, D., Ullman, J.D.: Big data: a research agenda. In: IDEAS 2013, pp. 198–203 (2013)

    Google Scholar 

  15. Daenen, J., Neven, F., Tan, T.: Gumbo: guarded fragment queries over big data. In: EDBT 2015, pp. 521–524 (2015)

    Google Scholar 

  16. Diaz-Valenzuela, I., Martin-Bautista, M.J., Vila, M.-A.: A fuzzy semisupervised clustering method: application to the classification of scientific publications. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014, Part I. CCIS, vol. 442, pp. 179–188. Springer, Heidelberg (2014)

    Google Scholar 

  17. Fan, W., Bifet, A.: Mining big data: current status, and forecast to the future. ACM SIGKDD Explor. 14(2), 1–5 (2012)

    Article  Google Scholar 

  18. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD 2000, pp. 1–12 (2000)

    Google Scholar 

  19. Hodáková, P., Perfilieva, I., Hurtík, P.: F-transform and its extension as tool for big data processing. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014, Part III. CCIS, vol. 444, pp. 374–383. Springer, Heidelberg (2014)

    Google Scholar 

  20. Jiang, F., Kawagoe, K., Leung, C.K.: Big social network mining for “following” patterns. In: C3S2E 2015, pp. 28–37 (2015)

    Google Scholar 

  21. Jiang, F., Leung, C.K.: A data analytic algorithm for managing, querying, and processing uncertain big data in cloud environments. Algorithms 8(4), 1175–1194 (2015)

    Article  Google Scholar 

  22. Jiang, F., Leung, C.K., Liu, D.: Efficiency improvements in social network communication via MapReduce. In: IEEE DSDIS 2015, pp. 161–168 (2015)

    Google Scholar 

  23. Leung, C.K.: Big data mining applications and services. In: BigDAS 2015, pp. 1–8 (2015)

    Google Scholar 

  24. Leung, C.K.: Uncertain frequent pattern mining. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 417–453. Springer, Switzerland (2014)

    Google Scholar 

  25. Leung, C.K., Cuzzocrea, A.: Frequent subgraph mining from streams of uncertain data. In: C3S2E 2015, pp. 18–27 (2015)

    Google Scholar 

  26. Leung, C.K.-S., Cuzzocrea, A., Jiang, F.: Discovering frequent patterns from uncertain data streams with time-fading and landmark models. In: Hameurlain, A., Küng, J., Wagner, R., Cuzzocrea, A., Dayal, U. (eds.) TLDKS VIII. LNCS, vol. 7790, pp. 174–196. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  27. Leung, C.K., Jiang, F., Pazdor, A.G.M., Peddle, A.M.: Parallel social network mining for interesting ‘following’ patterns. Concurrency Computat. Pract. Exper. (2016). doi:10.1002/cpe.3773

    Google Scholar 

  28. Leung, C.K.-S., MacKinnon, R.K.: BLIMP: a compact tree structure for uncertain frequent pattern mining. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 115–123. Springer, Heidelberg (2014)

    Google Scholar 

  29. Leung, C.K., MacKinnon, R.K., Tanbeer, S.K.: Tightening upper bounds to expected support for uncertain frequent pattern mining. Procedia Comput. Sci. 35, 328–337 (2014)

    Article  Google Scholar 

  30. Leung, C.K.-S., Mateo, M.A.F., Brajczuk, D.A.: A tree-based approach for frequent pattern mining from uncertain data. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 653–661. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  31. Leung, C.K.-S., Tanbeer, S.K.: Fast tree-based mining of frequent itemsets from uncertain data. In: Lee, S., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part I. LNCS, vol. 7238, pp. 272–287. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  32. Leung, C.K.-S., Tanbeer, S.K.: PUF-Tree: a compact tree structure for frequent pattern mining of uncertain data. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part I. LNCS (LNAI), vol. 7818, pp. 13–25. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  33. Li, X., Wang, Y., Li, X., Wang, X., Yu, J.: GDPS: an efficient approach for skyline queries over distributed uncertain data. Big Data Res. 1, 23–36 (2014)

    Article  Google Scholar 

  34. Liu, C., Chen, L., Zhang, C.: Summarizing probabilistic frequent patterns: a fast approach. In: ACM KDD 2013, pp. 527–535 (2013)

    Google Scholar 

  35. Liu, Y.-H.: Mining time-interval univariate uncertain sequential patterns. Data Knowl. Eng. 100, 54–77 (2015)

    Article  Google Scholar 

  36. MacKinnon, R.K., Leung, C.K.-S., Tanbeer, S.K.: A scalable data analytics algorithm for mining frequent patterns from uncertain data. In: Peng, W.-C., Wang, H., Bailey, J., Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P. (eds.) PAKDD 2014 Workshops. LNCS (LNAI), vol. 8643, pp. 404–416. Springer, Heidelberg (2014)

    Google Scholar 

  37. MacKinnon, R.K., Strauss, T.D., Leung, C.K.: DISC: efficient uncertain frequent pattern mining with tightened upper bounds. In: IEEE ICDM 2014 Workshops, pp. 1038–1045 (2014)

    Google Scholar 

  38. Nguyen, H.T.H., Cao, J.: Trustworthy answers for top-k queries on uncertain big data in decision making. Inf. Sci. 318, 73–90 (2015)

    Article  MathSciNet  Google Scholar 

  39. Pei, J.: Some new progress in analyzing and mining uncertain and probabilistic data for big data analytics. In: Ciucci, D., Inuiguchi, M., Yao, Y., Ślęzak, D., Wang, G. (eds.) RSFDGrC 2013. LNCS (LNAI), vol. 8170, pp. 38–45. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  40. Petry, F.E.: Data mining approaches for geo-spatial big data: uncertainty issues. Int. J. Organ. Collective Intell. 3(1), 52–71 (2012)

    Article  MathSciNet  Google Scholar 

  41. Rahman, Q.M., Fariha, A., Mandal, A., Ahmed, C.F., Leung, C.K.: A sliding window-based algorithm for detecting leaders from social network action streams. In: IEEE/WIC/ACM WI-IAT 2015, vol. 1, pp. 133–136 (2015)

    Google Scholar 

  42. Saati, S., Hatami-Marbini, A., Tavana, M., Agrell, P.J.: A fuzzy data envelopment analysis for clustering operating units with imprecise data. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 21(1), 29–54 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  43. Samet, A., Lefèvre, E., Ben Yahia, S.: Classification with evidential associative rules. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014, Part I. CCIS, vol. 442, pp. 25–35. Springer, Heidelberg (2014)

    Google Scholar 

  44. Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain databases. PVLDB 5(11), 1650–1661 (2012)

    Google Scholar 

  45. Xu, J., Li, N., Mao, X.-J., Yang, Y.-B.: Efficient probabilistic frequent itemset mining in big sparse uncertain data. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS (LNAI), vol. 8862, pp. 235–247. Springer, Heidelberg (2014)

    Google Scholar 

Download references

Acknowledgements

This project is partially supported by NSERC (Canada) and University of Manitoba.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carson K. Leung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Cuzzocrea, A., Leung, C.K. (2016). Computing Theoretically-Sound Upper Bounds to Expected Support for Frequent Pattern Mining Problems over Uncertain Big Data. In: Carvalho, J., Lesot, MJ., Kaymak, U., Vieira, S., Bouchon-Meunier, B., Yager, R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2016. Communications in Computer and Information Science, vol 611. Springer, Cham. https://doi.org/10.1007/978-3-319-40581-0_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40581-0_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40580-3

  • Online ISBN: 978-3-319-40581-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics