Skip to main content

SQL Based Frequent Pattern Mining with FP-Growth

  • Conference paper
Applications of Declarative Programming and Knowledge Management (INAP 2004, WLP 2004)

Abstract

Scalable data mining in large databases is one of today’s real challenges to database research area. The integration of data mining with database systems is an essential component for any successful large-scale data mining application. A fundamental component in data mining tasks is finding frequent patterns in a given dataset. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns. In this study we present an evaluation of SQL based frequent pattern mining with a novel frequent pattern growth (FP-growth) method, which is efficient and scalable for mining both long and short patterns without candidate generation. We examine some techniques to improve performance. In addition, we have made performance evaluation on DBMS with IBM DB2 UDB EEE V8.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, R., Aggarwal, C., Prasad, V.: A tree projection algorithm for generation of frequent itemsets. Journal of Parallel and Distributed Computing(Special Issue on High Performance Data Mining) (2000)

    Google Scholar 

  2. Agrawal, R., Shim, K.: Developing tightly-coupled data mining application on a relational database system. In: Proc.of the 2nd Int. Conf. on Knowledge Discovery in Database and Data Mining, Portland, Oregon (1996)

    Google Scholar 

  3. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the 20st VLDB Conference, Santiago, Chile, pp. 487–499 (1994)

    Google Scholar 

  4. Han, J., pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. of the ACM SIGMOD Conference on Management of data (2000)

    Google Scholar 

  5. Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: DMQL: A data mining query language for relational database. In: Proc. Of the 1996 SIGMOD workshop on research issues on data mining and knowledge discovery, Montreal, Canada (1996)

    Google Scholar 

  6. Houtsma, M., Swami, A.: Set-oriented data mining in relational databases. DKE 17(3), 245–262 (1995)

    Article  Google Scholar 

  7. Meo, R., Psaila, G., Ceri, S.: A new SQL like operator for mining association rules. In: Proc. Of the 22nd Int. Conf. on Very Large Databases, Bombay, India (1996)

    Google Scholar 

  8. Park, J.S., Chen, M., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proc. of the ACM SIGMOD Conference on Management of data, pp. 175–186 (1995)

    Google Scholar 

  9. Pramudiono, I., Shintani, T., Tamura, T., Kitsuregawa, M.: Parallel SQL based associaton rule mining on large scale PC cluster: performance comparision with directly coded C implementation. In: Proc. Of Third Pacific-Asia Conf. on Knowledge Discovery and Data Mining (1999)

    Google Scholar 

  10. Rantzau, R.: Processing frequent itemset discovery queries by division and set containment join operators. In: DMKD 2003: 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2003)

    Google Scholar 

  11. Savsere, A., Omiecinski, E., Navathe, S.: An efficient algorithm for mining association rules in large databases. In: Proc. of the 21st VLDB Conference (1995)

    Google Scholar 

  12. Sarawagi, S., Thomas, S., Agrawal, R.: Integrating mining with relational database systems: alternatives and implications. In: Proc. of the ACM SIGMOD Conference on Management of data, Seattle, Washinton, USA (1998)

    Google Scholar 

  13. Sattel, K., Dunemann, O.: SQL database primitives for decision tree classifiers. In: Proc. Of the 10nd ACM CIKN Int. Conf. on Information and Knowledge Management, Atlanta, Georgia (2001)

    Google Scholar 

  14. Thomas, S., Chakravarthy, S.: Performance evaluation and optimization of join queries for association rule mining. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 241–250. Springer, Heidelberg (1999)

    Google Scholar 

  15. Wang, H., Zaniolo, C.: Using SQL to build new aggregates and extenders for Object-Relational systems. In: Proc. Of the 26th Int. Conf. on Very Large Databases, Cairo, Egypt (2000)

    Google Scholar 

  16. Yoshizawa, T., Pramudiono, I., Kitsuregawa, M.: SQL based association rule mining using commercial RDBMS (IBM DB2 UDB EEE). In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, p. 301. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shang, X., Sattler, KU., Geist, I. (2005). SQL Based Frequent Pattern Mining with FP-Growth. In: Seipel, D., Hanus, M., Geske, U., Bartenstein, O. (eds) Applications of Declarative Programming and Knowledge Management. INAP WLP 2004 2004. Lecture Notes in Computer Science(), vol 3392. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11415763_3

Download citation

  • DOI: https://doi.org/10.1007/11415763_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-25560-4

  • Online ISBN: 978-3-540-32124-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics