Abstract
Frequent pattern mining aims to discover implicit, previously unknown, and potentially useful knowledge in the form of sets of frequently co-occurring items, events, or objects. To mine frequent patterns from probabilistic datasets of uncertain data, where each item in a transaction is usually associated with an existential probability expressing the likelihood of its presence in that transaction, the UF-growth algorithm captures important information about uncertain data in a UF-tree structure so that expected support can be computed for each pattern. A pattern is considered frequent if its expected support meets or exceeds the user-specified threshold. However, a challenge is that the UF-tree can be large. To handle this challenge, several algorithms use smaller trees such that upper bounds to expected support can be computed. In this paper, we examine these upper bounds, and determine which ones provide tighter upper bounds to expected support for frequent pattern mining of uncertain big data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C., Li, Y., Wang, J., Wang, J.: Frequent pattern mining with uncertain data. In: ACM KDD 2009, pp. 29–37 (2009)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB 1994, pp. 487–499 (1994)
Ahmed, A.U., Ahmed, C.F., Samiullah, M., Adnan, N., Leung, C.K.: Mining interesting patterns from uncertain databases. Inf. Sci. 354, 60–85 (2016)
Aryadinata, Y.S., Lin, Y., Barcellos, C., Laurent, A., Libourel, T.: Mining epidemiological dengue fever data from Brazil: a gradual pattern based geographical information system. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014, Part II. CCIS, vol. 443, pp. 414–423. Springer, Heidelberg (2014)
Calders, T., Garboni, C., Goethals, B.: Efficient pattern mining of uncertain data with sampling. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010, Part I. LNCS (LNAI), vol. 6118, pp. 480–487. Springer, Heidelberg (2010)
Chen, L., Liu, C., Zhang, C.: Mining probabilistic representative frequent patterns from uncertain data. In: SIAM SDM 2013, pp. 73–81 (2013)
Cuzzocrea, A.: Analytics over big data: exploring the convergence of data warehousing, OLAP and data-intensive cloud infrastructures. In: IEEE COMPSAC 2013, pp. 481–483 (2013)
Cuzzocrea, A.: Approximate OLAP query processing over uncertain and imprecise multidimensional data streams. In: Decker, H., Lhotská, L., Link, S., Basl, J., Tjoa, A.M. (eds.) DEXA 2013, Part II. LNCS, vol. 8056, pp. 156–173. Springer, Heidelberg (2013)
Cuzzocrea, A.: Retrieving accurate estimates to OLAP queries over uncertain and imprecise multidimensional data streams. In: Cushing, J.B., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 575–576. Springer, Heidelberg (2011)
Cuzzocrea, A., Bellatreche, L., Song, I.-Y.: Data warehousing and OLAP over big data: current challenges and future research directions. In: ACM DOLAP 2013, pp. 67–70 (2013)
Cuzzocrea, A., Darmont, J., Mahboubi, H.: Fragmenting very large XML data warehouses via K-means clustering algorithm. Int. J. Bus. Intell. Data Min. 4(3/4), 301–328 (2009)
Cuzzocrea, A., Leung, C.K.: Upper bounds to expected support for frequent itemset mining of uncertain big data. In: ACM SAC 2015, pp. 919–921 (2015)
Cuzzocrea, A., Leung, C.K., MacKinnon, R.K.: Mining constrained frequent itemsets from distributed uncertain data. Future Gener. Comput. Syst. 37, 117–126 (2014)
Cuzzocrea, A., Saccà, D., Ullman, J.D.: Big data: a research agenda. In: IDEAS 2013, pp. 198–203 (2013)
Daenen, J., Neven, F., Tan, T.: Gumbo: guarded fragment queries over big data. In: EDBT 2015, pp. 521–524 (2015)
Diaz-Valenzuela, I., Martin-Bautista, M.J., Vila, M.-A.: A fuzzy semisupervised clustering method: application to the classification of scientific publications. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014, Part I. CCIS, vol. 442, pp. 179–188. Springer, Heidelberg (2014)
Fan, W., Bifet, A.: Mining big data: current status, and forecast to the future. ACM SIGKDD Explor. 14(2), 1–5 (2012)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD 2000, pp. 1–12 (2000)
Hodáková, P., Perfilieva, I., Hurtík, P.: F-transform and its extension as tool for big data processing. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014, Part III. CCIS, vol. 444, pp. 374–383. Springer, Heidelberg (2014)
Jiang, F., Kawagoe, K., Leung, C.K.: Big social network mining for “following” patterns. In: C3S2E 2015, pp. 28–37 (2015)
Jiang, F., Leung, C.K.: A data analytic algorithm for managing, querying, and processing uncertain big data in cloud environments. Algorithms 8(4), 1175–1194 (2015)
Jiang, F., Leung, C.K., Liu, D.: Efficiency improvements in social network communication via MapReduce. In: IEEE DSDIS 2015, pp. 161–168 (2015)
Leung, C.K.: Big data mining applications and services. In: BigDAS 2015, pp. 1–8 (2015)
Leung, C.K.: Uncertain frequent pattern mining. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 417–453. Springer, Switzerland (2014)
Leung, C.K., Cuzzocrea, A.: Frequent subgraph mining from streams of uncertain data. In: C3S2E 2015, pp. 18–27 (2015)
Leung, C.K.-S., Cuzzocrea, A., Jiang, F.: Discovering frequent patterns from uncertain data streams with time-fading and landmark models. In: Hameurlain, A., Küng, J., Wagner, R., Cuzzocrea, A., Dayal, U. (eds.) TLDKS VIII. LNCS, vol. 7790, pp. 174–196. Springer, Heidelberg (2013)
Leung, C.K., Jiang, F., Pazdor, A.G.M., Peddle, A.M.: Parallel social network mining for interesting ‘following’ patterns. Concurrency Computat. Pract. Exper. (2016). doi:10.1002/cpe.3773
Leung, C.K.-S., MacKinnon, R.K.: BLIMP: a compact tree structure for uncertain frequent pattern mining. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 115–123. Springer, Heidelberg (2014)
Leung, C.K., MacKinnon, R.K., Tanbeer, S.K.: Tightening upper bounds to expected support for uncertain frequent pattern mining. Procedia Comput. Sci. 35, 328–337 (2014)
Leung, C.K.-S., Mateo, M.A.F., Brajczuk, D.A.: A tree-based approach for frequent pattern mining from uncertain data. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 653–661. Springer, Heidelberg (2008)
Leung, C.K.-S., Tanbeer, S.K.: Fast tree-based mining of frequent itemsets from uncertain data. In: Lee, S., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part I. LNCS, vol. 7238, pp. 272–287. Springer, Heidelberg (2012)
Leung, C.K.-S., Tanbeer, S.K.: PUF-Tree: a compact tree structure for frequent pattern mining of uncertain data. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part I. LNCS (LNAI), vol. 7818, pp. 13–25. Springer, Heidelberg (2013)
Li, X., Wang, Y., Li, X., Wang, X., Yu, J.: GDPS: an efficient approach for skyline queries over distributed uncertain data. Big Data Res. 1, 23–36 (2014)
Liu, C., Chen, L., Zhang, C.: Summarizing probabilistic frequent patterns: a fast approach. In: ACM KDD 2013, pp. 527–535 (2013)
Liu, Y.-H.: Mining time-interval univariate uncertain sequential patterns. Data Knowl. Eng. 100, 54–77 (2015)
MacKinnon, R.K., Leung, C.K.-S., Tanbeer, S.K.: A scalable data analytics algorithm for mining frequent patterns from uncertain data. In: Peng, W.-C., Wang, H., Bailey, J., Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P. (eds.) PAKDD 2014 Workshops. LNCS (LNAI), vol. 8643, pp. 404–416. Springer, Heidelberg (2014)
MacKinnon, R.K., Strauss, T.D., Leung, C.K.: DISC: efficient uncertain frequent pattern mining with tightened upper bounds. In: IEEE ICDM 2014 Workshops, pp. 1038–1045 (2014)
Nguyen, H.T.H., Cao, J.: Trustworthy answers for top-k queries on uncertain big data in decision making. Inf. Sci. 318, 73–90 (2015)
Pei, J.: Some new progress in analyzing and mining uncertain and probabilistic data for big data analytics. In: Ciucci, D., Inuiguchi, M., Yao, Y., Ślęzak, D., Wang, G. (eds.) RSFDGrC 2013. LNCS (LNAI), vol. 8170, pp. 38–45. Springer, Heidelberg (2013)
Petry, F.E.: Data mining approaches for geo-spatial big data: uncertainty issues. Int. J. Organ. Collective Intell. 3(1), 52–71 (2012)
Rahman, Q.M., Fariha, A., Mandal, A., Ahmed, C.F., Leung, C.K.: A sliding window-based algorithm for detecting leaders from social network action streams. In: IEEE/WIC/ACM WI-IAT 2015, vol. 1, pp. 133–136 (2015)
Saati, S., Hatami-Marbini, A., Tavana, M., Agrell, P.J.: A fuzzy data envelopment analysis for clustering operating units with imprecise data. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 21(1), 29–54 (2013)
Samet, A., Lefèvre, E., Ben Yahia, S.: Classification with evidential associative rules. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds.) IPMU 2014, Part I. CCIS, vol. 442, pp. 25–35. Springer, Heidelberg (2014)
Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain databases. PVLDB 5(11), 1650–1661 (2012)
Xu, J., Li, N., Mao, X.-J., Yang, Y.-B.: Efficient probabilistic frequent itemset mining in big sparse uncertain data. In: Pham, D.-N., Park, S.-B. (eds.) PRICAI 2014. LNCS (LNAI), vol. 8862, pp. 235–247. Springer, Heidelberg (2014)
Acknowledgements
This project is partially supported by NSERC (Canada) and University of Manitoba.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Cuzzocrea, A., Leung, C.K. (2016). Computing Theoretically-Sound Upper Bounds to Expected Support for Frequent Pattern Mining Problems over Uncertain Big Data. In: Carvalho, J., Lesot, MJ., Kaymak, U., Vieira, S., Bouchon-Meunier, B., Yager, R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2016. Communications in Computer and Information Science, vol 611. Springer, Cham. https://doi.org/10.1007/978-3-319-40581-0_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-40581-0_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40580-3
Online ISBN: 978-3-319-40581-0
eBook Packages: Computer ScienceComputer Science (R0)