Abstract
To mine frequent patterns from uncertain data, many existing algorithms (e.g., UF-growth) directly calculate the expected support of a pattern. Consequently, they require a significant amount of storage space to capture all existential probability values among the items in the data. To reduce the amount of required storage space, some existing algorithms (e.g., PUF-growth) combine nodes with the same item by storing an upper bound on expected support. Consequently, they lead to many false positives in the intermediate mining step. There is trade-off between storage space and accuracy. In this paper, we introduce a new algorithm called MUF-growth for achieving a tighter upper bound on expected support than PUF-growth while balancing the storage space requirement. We evaluate the trade-off between storing more information to further tighten the bound and its effect on the performance of the algorithm. Our experimental results reveal a diminishing return on performance as the bound is increasingly tightened, allowing us to make a recommendation on the most effective use of extra storage towards increasing the efficiency of the algorithm.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) VLDB 1994, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Aggarwal, C.C., Li, Y., Wang, J., Wang, J.: Frequent pattern mining with uncertain data. In: Elder, J.F., Fogelman-Soulié, F., Flach, P.A., Zaki, M.J. (eds.) ACM KDD 2009, pp. 29–37. ACM, New York (2009)
Bernecker, T., Kriegel, H.-P., Renz, M., Verhein, F., Zuefle, A.: Probabilistic fre- quent itemset mining in uncertain databases. In: Elder, J.F., Fogelman-Soulié, F., Flach, P.A., Zaki, M.J. (eds.) ACM KDD 2009, pp. 119–127. ACM, New York (2009)
Calders, T., Garboni, C., Goethals, B.: Approximation of frequentness probability of itemsets in uncertain data. In: Webb, G.I., Liu, B., Zhang, C., Gunopulos, D., Wu, X. (eds.) IEEE ICDM 2010, pp. 749–754. IEEE, Los Alamitos (2010)
Calders, T., Garboni, C., Goethals, B.: Efficient pattern mining of uncertain data with sampling. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010, Part I. LNCS (LNAI), vol. 6118, pp. 480–487. Springer, Heidelberg (2010)
Fariha, A., Ahmed, C.F., Leung, C.K.-S., Abdullah, S.M., Cao, L.: Mining frequent patterns from human interactions in meetings using directed acyclic graphs. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part I. LNCS (LNAI), vol. 7818, pp. 38–49. Springer, Heidelberg (2013)
Fournier-Viger, P., Gomariz, A., Šebek, M., Hlosta, M.: VGEN: fast vertical mining of sequential generator patterns. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 476–488. Springer, Heidelberg (2014)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) ACM SIGMOD 2000, pp. 1–12. ACM, New York (2000)
Jiang, F., Leung, C.K.-S.: Stream mining of frequent patterns from delayed batches of uncertain data. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2013. LNCS, vol. 8057, pp. 209–221. Springer, Heidelberg (2013)
Jiang, F., Leung, C.K.-S., Liu, D., Peddle, A.M.: Discovery of really popular friends from social networks. In: IEEE BDCloud 2014, pp. 342–349. IEEE, Los Alamitos (2014)
Leung, C.K.-S.: Uncertain frequent pattern mining. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 417–453. Springer, Switzerland (2014)
Leung, C.K.-S., Jiang, F.: A data science solution for mining interesting patterns from uncertain big data. In: IEEE BDCloud 2014, pp. 235–242. IEEE, Los Alamitos (2014)
Leung, C.K.-S., MacKinnon, R.K.: BLIMP: a compact tree structure for uncertain frequent pattern mining. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 115–123. Springer, Heidelberg (2014)
Leung, C.K.-S., MacKinnon, R.K., Tanbeer, S.K.: Fast algorithms for frequent itemset mining from uncertain data. In: Kumar, R., Toivonen, H., Pei, J., Huang, J.Z., Wu, X. (eds.) IEEE ICDM 2014, pp. 893–898. IEEE, Los Alamitos (2014)
Leung, C.K.-S., Mateo, M.A.F., Brajczuk, D.A.: A tree-based approach for frequent pattern mining from uncertain data. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 653–661. Springer, Heidelberg (2008)
Leung, C.K.-S., Tanbeer, S.K.: Fast tree-based mining of frequent itemsets from uncertain data. In: Lee, S., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part I. LNCS, vol. 7238, pp. 272–287. Springer, Heidelberg (2012)
Leung, C.K.-S., Tanbeer, S.K.: Mining popular patterns from transactional databases. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 291–302. Springer, Heidelberg (2012)
Leung, C.K.-S., Tanbeer, S.K.: PUF-tree: a compact tree structure for frequent pattern mining of uncertain data. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part I. LNCS (LNAI), vol. 7818, pp. 13–25. Springer, Heidelberg (2013)
Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain databases. PVLDB 5(11), 1650–1661 (2012)
Zhang, Q., Li, F., Yi, K.: Finding frequent items in probabilistic data. In: Wang, J.T.-L. (ed.) ACM SIGMOD 2008, pp. 819–832. ACM, New York (2008)
Acknowledgement
This project is partially supported by NSERC (Canada) and University of Manitoba.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Leung, C.KS., MacKinnon, R.K. (2015). Balancing Tree Size and Accuracy in Fast Mining of Uncertain Frequent Patterns. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2015. Lecture Notes in Computer Science(), vol 9263. Springer, Cham. https://doi.org/10.1007/978-3-319-22729-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-22729-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22728-3
Online ISBN: 978-3-319-22729-0
eBook Packages: Computer ScienceComputer Science (R0)