Skip to main content

Balancing Tree Size and Accuracy in Fast Mining of Uncertain Frequent Patterns

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9263))

Abstract

To mine frequent patterns from uncertain data, many existing algorithms (e.g., UF-growth) directly calculate the expected support of a pattern. Consequently, they require a significant amount of storage space to capture all existential probability values among the items in the data. To reduce the amount of required storage space, some existing algorithms (e.g., PUF-growth) combine nodes with the same item by storing an upper bound on expected support. Consequently, they lead to many false positives in the intermediate mining step. There is trade-off between storage space and accuracy. In this paper, we introduce a new algorithm called MUF-growth for achieving a tighter upper bound on expected support than PUF-growth while balancing the storage space requirement. We evaluate the trade-off between storing more information to further tighten the bound and its effect on the performance of the algorithm. Our experimental results reveal a diminishing return on performance as the bound is increasingly tightened, allowing us to make a recommendation on the most effective use of extra storage towards increasing the efficiency of the algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) VLDB 1994, pp. 487–499. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  2. Aggarwal, C.C., Li, Y., Wang, J., Wang, J.: Frequent pattern mining with uncertain data. In: Elder, J.F., Fogelman-Soulié, F., Flach, P.A., Zaki, M.J. (eds.) ACM KDD 2009, pp. 29–37. ACM, New York (2009)

    Google Scholar 

  3. Bernecker, T., Kriegel, H.-P., Renz, M., Verhein, F., Zuefle, A.: Probabilistic fre- quent itemset mining in uncertain databases. In: Elder, J.F., Fogelman-Soulié, F., Flach, P.A., Zaki, M.J. (eds.) ACM KDD 2009, pp. 119–127. ACM, New York (2009)

    Google Scholar 

  4. Calders, T., Garboni, C., Goethals, B.: Approximation of frequentness probability of itemsets in uncertain data. In: Webb, G.I., Liu, B., Zhang, C., Gunopulos, D., Wu, X. (eds.) IEEE ICDM 2010, pp. 749–754. IEEE, Los Alamitos (2010)

    Google Scholar 

  5. Calders, T., Garboni, C., Goethals, B.: Efficient pattern mining of uncertain data with sampling. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010, Part I. LNCS (LNAI), vol. 6118, pp. 480–487. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  6. Fariha, A., Ahmed, C.F., Leung, C.K.-S., Abdullah, S.M., Cao, L.: Mining frequent patterns from human interactions in meetings using directed acyclic graphs. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part I. LNCS (LNAI), vol. 7818, pp. 38–49. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  7. Fournier-Viger, P., Gomariz, A., Šebek, M., Hlosta, M.: VGEN: fast vertical mining of sequential generator patterns. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 476–488. Springer, Heidelberg (2014)

    Google Scholar 

  8. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Chen, W., Naughton, J.F., Bernstein, P.A. (eds.) ACM SIGMOD 2000, pp. 1–12. ACM, New York (2000)

    Google Scholar 

  9. Jiang, F., Leung, C.K.-S.: Stream mining of frequent patterns from delayed batches of uncertain data. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2013. LNCS, vol. 8057, pp. 209–221. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  10. Jiang, F., Leung, C.K.-S., Liu, D., Peddle, A.M.: Discovery of really popular friends from social networks. In: IEEE BDCloud 2014, pp. 342–349. IEEE, Los Alamitos (2014)

    Google Scholar 

  11. Leung, C.K.-S.: Uncertain frequent pattern mining. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 417–453. Springer, Switzerland (2014)

    Google Scholar 

  12. Leung, C.K.-S., Jiang, F.: A data science solution for mining interesting patterns from uncertain big data. In: IEEE BDCloud 2014, pp. 235–242. IEEE, Los Alamitos (2014)

    Google Scholar 

  13. Leung, C.K.-S., MacKinnon, R.K.: BLIMP: a compact tree structure for uncertain frequent pattern mining. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 115–123. Springer, Heidelberg (2014)

    Google Scholar 

  14. Leung, C.K.-S., MacKinnon, R.K., Tanbeer, S.K.: Fast algorithms for frequent itemset mining from uncertain data. In: Kumar, R., Toivonen, H., Pei, J., Huang, J.Z., Wu, X. (eds.) IEEE ICDM 2014, pp. 893–898. IEEE, Los Alamitos (2014)

    Google Scholar 

  15. Leung, C.K.-S., Mateo, M.A.F., Brajczuk, D.A.: A tree-based approach for frequent pattern mining from uncertain data. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 653–661. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Leung, C.K.-S., Tanbeer, S.K.: Fast tree-based mining of frequent itemsets from uncertain data. In: Lee, S., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part I. LNCS, vol. 7238, pp. 272–287. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  17. Leung, C.K.-S., Tanbeer, S.K.: Mining popular patterns from transactional databases. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 291–302. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  18. Leung, C.K.-S., Tanbeer, S.K.: PUF-tree: a compact tree structure for frequent pattern mining of uncertain data. In: Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G. (eds.) PAKDD 2013, Part I. LNCS (LNAI), vol. 7818, pp. 13–25. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  19. Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain databases. PVLDB 5(11), 1650–1661 (2012)

    Google Scholar 

  20. Zhang, Q., Li, F., Yi, K.: Finding frequent items in probabilistic data. In: Wang, J.T.-L. (ed.) ACM SIGMOD 2008, pp. 819–832. ACM, New York (2008)

    Google Scholar 

Download references

Acknowledgement

This project is partially supported by NSERC (Canada) and University of Manitoba.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carson Kai-Sang Leung .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Leung, C.KS., MacKinnon, R.K. (2015). Balancing Tree Size and Accuracy in Fast Mining of Uncertain Frequent Patterns. In: Madria, S., Hara, T. (eds) Big Data Analytics and Knowledge Discovery. DaWaK 2015. Lecture Notes in Computer Science(), vol 9263. Springer, Cham. https://doi.org/10.1007/978-3-319-22729-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22729-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22728-3

  • Online ISBN: 978-3-319-22729-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics