Skip to main content

Frequent Itemset Discovery with SQL Using Universal Quantification

  • Chapter
Database Support for Data Mining Applications

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2682))

Abstract

Algorithms for finding frequent itemsets fall into two broad categories: algorithms that are based on non-trivial SQL statements to query and update a database, and algorithms that employ sophisticated in-memory data structures, where the data is stored in flat files. Most performance experiments have shown that SQL-based approaches are inferior to main-memory algorithms. However, the current trend of database vendors to integrate analysis functionalities into their query execution and optimization components, i.e., “closer to the data,” suggests to revisit these results and to search for new, potentially better solutions.

We investigate approaches based on SQL-92 and present a new approach called Quiver that employs universal and existential quantifications. In the table schema for itemsets of our approach, a group of tuples represents a single itemset. Such a “vertical” layout is similar to the popular layout used for the transaction table, which is the input of frequent itemset discovery. We show that current DBMS do not provide efficient query processing strategies for dealing with quantified queries, mostly due to the lack of an adequate SQL syntax for set containment tests. Performance tests using a query processor prototype and a novel query operator, called set containment division, promise an improved performance for quantified queries like those used for Quiver.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings SIGMOD, Washington DC, USA, pp. 207–216 (1993)

    Google Scholar 

  2. Clear, J., Dunn, D., Harvey, B., Heytens, M., Lohman, P., Mehta, A., Melton, M., Rohrberg, L., Savasere, A., Wehrmeister, R., Xu, M.: Nonstop SQL/MX primitives for knowledge discovery. In: Proceedings KDD, San Diego, California, USA, pp. 425–429 (1999)

    Google Scholar 

  3. Hipp, J., Günzer, U., Grimmer, U.: Integrating association rule mining algorithms with relational database systems. In: Proceedings ICEIS, Setubal, Portugal, pp. 130–137 (2001)

    Google Scholar 

  4. Imielinski, T., Virmani, A.: MSQL: A query language for database mining. DMKD 3, 373–408 (1999)

    Google Scholar 

  5. Han, J., Fu, Y., Koperski, K., Zaiane, O.: DMQL: A data mining query language for relational databases. In: Proceedings DMKD Workshop, Montreal, Canada (1996)

    Google Scholar 

  6. Hyong, T., Indriyati, A., Lup, L.: Towards ad hoc mining of association rules with database management systems. Research report, School of Computing, National University of Singapore (2000)

    Google Scholar 

  7. Wang, H., Zaniolo, C.: Atlas: A native extension of sql for data mining and stream computations. Technical report, Computer Science Department, UCLA, USA (2002)

    Google Scholar 

  8. Meo, R., Psaila, G., Ceri, S.: A new SQL-like operator for mining association rules. In: Proceedings VLDB, Bombay, India, pp. 122–133 (1996)

    Google Scholar 

  9. Netz, A., Chaudhuri, S., Fayyad, U., Bernhardt, J.: Integrating data mining with SQL databases: OLE DB for data mining. In: Proceedings ICDE, Heidelberg, Germany, pp. 379–387 (2001)

    Google Scholar 

  10. Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: Alternatives and implications. Research report rj 10107 (91923), IBM Almaden Research Center, San Jose, California, USA (1998)

    Google Scholar 

  11. Zaniolo, C.: Extending SQL for decision support applications. In: Presentation slides of keynote address at DMDW, Toronto, Ontario, Canada (2002)

    Google Scholar 

  12. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)

    MATH  Google Scholar 

  13. Codd, E.: Relational completeness of database sub-languages. In: Rustin, R. (ed.) Courant Computer Science Symposium 6: Database Systems, pp. 65–98. Prentice-Hall, Englewood Cliffs (1972)

    Google Scholar 

  14. Rantzau, R., Shapiro, L., Mitschang, B., Wang, Q.: Algorithms and applications for universal quantification in relational databases. Information Systems Journal 28, 3–32 (2003)

    Article  MATH  Google Scholar 

  15. Graefe, G., Cole, R.: Fast algorithms for universal quantification in large databases. TODS 20, 187–236 (1995)

    Article  Google Scholar 

  16. Melnik, S., Garcia-Molina, H.: Adaptive algorithms for set containment joins. TODS 28 (2003)

    Google Scholar 

  17. Rantzau, R.: Processing frequent itemset discovery queries by division and set containment join operators. In: Proceedings ACM Workshop DMKD, San Diego, California, USA (2003)

    Google Scholar 

  18. Rajamani, K., Cox, A., Iyer, B., Chadha, A.: Efficient mining for association rules with relational database systems. In: Proceedings IDEAS, Montreal, Canada, pp. 148–155 (1999)

    Google Scholar 

  19. Holsheimer, M., Kersten, M., Mannila, H., Toivonen, H.: A perspective on databases and data mining. In: Proceedings KDD, Montreal, Quebec, Canada, pp. 150–155 (1995)

    Google Scholar 

  20. Houtsma, M., Swami, A.: Set-oriented data mining in relational databases. DKE 17, 245–262 (1995)

    Article  MATH  Google Scholar 

  21. Yoshizawa, T., Pramudiono, I., Kitsuregawa, M.: SQL based association rule mining using commercial RDBMS (IBM DB2 UDB EEE). In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, pp. 301–306. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  22. Pramudiono, I., Shintani, T., Tamura, T., Kitsuregawa, M.: Parallel SQL based association rule mining on large scale PC cluster: Performance comparison with directly coded C implementation. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 94–98. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  23. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings VLDB, Santiago, Chile, pp. 487–499 (1994)

    Google Scholar 

  24. Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering association rules. In: AAAI Workshop on Knowledge and Discovery in Databases, Seattle, Washington, USA, pp. 181–192 (1994)

    Google Scholar 

  25. Thomas, S., Chakravarthy, S.: Performance evaluation and optimization of join queries for association rule mining. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 241–250. Springer, Heidelberg (1999)

    Google Scholar 

  26. Ramakrishnan, R., Gehrke, J.: Database Management Systems, 2nd edn. McGraw-Hill, New York (2000)

    MATH  Google Scholar 

  27. Bercken, J.V.d., Blohsfeld, B., Dittrich, J.P., Krämer, J., Schäfer, T., Schneider, M., Seeger, M.: XXL – A library approach to supporting efficient implementations of advanced database queries. In: Proceedings VLDB, Rome, Italy, pp. 39–48 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Rantzau, R. (2004). Frequent Itemset Discovery with SQL Using Universal Quantification. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds) Database Support for Data Mining Applications. Lecture Notes in Computer Science(), vol 2682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44497-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-44497-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22479-2

  • Online ISBN: 978-3-540-44497-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics