Skip to main content

Building FP-Tree on the Fly: Single-Pass Frequent Itemset Mining

  • Conference paper
  • First Online:
Book cover Machine Learning and Data Mining in Pattern Recognition (MLDM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9729))

Abstract

The FP-Growth algorithm has been studied extensively in the field of frequent pattern mining. The algorithm offers the advantage of avoiding costly database scans in comparison with Apriori-based algorithms. However, since it still requires two database scans, it cannot be used on streaming data. Also, the algorithm is designed for static datasets, where the input transactions are fixed and thus cannot be used for incremental or interactive mining. Existing incremental mining algorithms are not easily adoptable for on-the-fly, fast, and memory efficient FP-tree mining. In this paper we propose a novel SPFP-tree (single pass frequent pattern tree) algorithm that scans the database only once and provides the same tree as FP-Growth. Our algorithm changes the tree structure dynamically to create a highly compact frequency-ordered tree on the fly. With the insertion of each new transaction our algorithm dynamically maintains a tree identical to an FP-tree. Experimental results show the efficiency of the SPFP-tree algorithm in both incremental and interactive mining of frequent patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the SIGMOD, New York (1993)

    Google Scholar 

  2. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Dallas, Texas (2000)

    Google Scholar 

  3. Leung, C.K.-S., Khan, Q.I., Li, Z., Hoque, T.: CanTree: a canonical-order tree for incremental frequent-pattern mining. Knowledge and Information Systems 11(3), 287–311 (2007)

    Article  Google Scholar 

  4. Tanbeer, S.K., Ahmed, C.F., Jeong, B.S., Lee, Y.K.: Efficient single-pass frequent pattern mining using a prefix-tree. Information Sciences 179(5), 559–583 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  5. Deng, Z.H., Wang, Z., Jiang, J.J.: A new algorithm for fast mining frequent itemsets using N-lists. SCIENCE CHINA Information Sciences 55(9), 2008–2030 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Deng, Z.H., Lv, S.L.: PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via Children-Parent Equivalence pruning. Expert Systems with Applications 42(13), 5424–5432 (2015)

    Article  Google Scholar 

  7. Deng, Z.H., Lv, S.L.: Fast mining frequent itemsets using Nodesets. Expert Systems with Applications 41(7), 3506–3513 (2014)

    Article  Google Scholar 

  8. Cheung, D.W., Han, J., Ng, V.T., Wong, C.Y.: Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of the ICDE, Los Alamitos, CA (1996)

    Google Scholar 

  9. Cheung, D.W., Lee, S.D., Kao, B.: A general incremental technique for maintaining discovered association rules. In: Proceedings of the DASFAA, Singapore (1997)

    Google Scholar 

  10. Ayan, N.F., Tansel, A.U., Arkun, E.: Efficient algorithm to update large itemsets with early pruning. In: Proceedings of the SIGKDD, New York (1999)

    Google Scholar 

  11. Koh, J.L., Shieh, S.F.: An efficient approach for maintaining association rules based on adjusting FP-tree structures. In: Proceedings of the DASFAA, New York (2004)

    Google Scholar 

  12. Li, X., Deng, Z.-H., Tang, S.-W.: A fast algorithm for maintenance of association rules in incremental databases. In: Li, X., Zaïane, O.R., Li, Z.-H. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 56–63. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  13. Hong, T.P., Lin, C.W., Wu, Y.L.: Incrementally fast updated frequent pattern trees. Expert Systems with Applications 34(4), 2424–2435 (2008)

    Article  Google Scholar 

  14. Cheung, W., Zaiane, O.R.: Incremental mining of frequent patterns without candidate generation or support constraint. In: Proceedings of the IDEAS, Los Alamitos, CA (2003)

    Google Scholar 

  15. Liu, G., Lu, H., Yu, J.X.: CFP-tree: A compact disk-based structure for storing and querying frequent itemsets. Information Systems 32(2), 295–319 (2007)

    Article  Google Scholar 

  16. Blake, C., Merz, C.: UCI repository of machine learning databases. University of California, Irvine (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nima Shahbazi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Shahbazi, N., Soltani, R., Gryz, J., An, A. (2016). Building FP-Tree on the Fly: Single-Pass Frequent Itemset Mining. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2016. Lecture Notes in Computer Science(), vol 9729. Springer, Cham. https://doi.org/10.1007/978-3-319-41920-6_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41920-6_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41919-0

  • Online ISBN: 978-3-319-41920-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics