Frequent Itemset Discovery with SQL Using Universal Quantification

Rantzau, Ralf

doi:10.1007/978-3-540-44497-8_10

Ralf Rantzau⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2682))

398 Accesses
6 Citations

Abstract

Algorithms for finding frequent itemsets fall into two broad categories: algorithms that are based on non-trivial SQL statements to query and update a database, and algorithms that employ sophisticated in-memory data structures, where the data is stored in flat files. Most performance experiments have shown that SQL-based approaches are inferior to main-memory algorithms. However, the current trend of database vendors to integrate analysis functionalities into their query execution and optimization components, i.e., “closer to the data,” suggests to revisit these results and to search for new, potentially better solutions.

We investigate approaches based on SQL-92 and present a new approach called Quiver that employs universal and existential quantifications. In the table schema for itemsets of our approach, a group of tuples represents a single itemset. Such a “vertical” layout is similar to the popular layout used for the transaction table, which is the input of frequent itemset discovery. We show that current DBMS do not provide efficient query processing strategies for dealing with quantified queries, mostly due to the lack of an adequate SQL syntax for set containment tests. Performance tests using a query processor prototype and a novel query operator, called set containment division, promise an improved performance for quantified queries like those used for Quiver.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings SIGMOD, Washington DC, USA, pp. 207–216 (1993)
Google Scholar
Clear, J., Dunn, D., Harvey, B., Heytens, M., Lohman, P., Mehta, A., Melton, M., Rohrberg, L., Savasere, A., Wehrmeister, R., Xu, M.: Nonstop SQL/MX primitives for knowledge discovery. In: Proceedings KDD, San Diego, California, USA, pp. 425–429 (1999)
Google Scholar
Hipp, J., Günzer, U., Grimmer, U.: Integrating association rule mining algorithms with relational database systems. In: Proceedings ICEIS, Setubal, Portugal, pp. 130–137 (2001)
Google Scholar
Imielinski, T., Virmani, A.: MSQL: A query language for database mining. DMKD 3, 373–408 (1999)
Google Scholar
Han, J., Fu, Y., Koperski, K., Zaiane, O.: DMQL: A data mining query language for relational databases. In: Proceedings DMKD Workshop, Montreal, Canada (1996)
Google Scholar
Hyong, T., Indriyati, A., Lup, L.: Towards ad hoc mining of association rules with database management systems. Research report, School of Computing, National University of Singapore (2000)
Google Scholar
Wang, H., Zaniolo, C.: Atlas: A native extension of sql for data mining and stream computations. Technical report, Computer Science Department, UCLA, USA (2002)
Google Scholar
Meo, R., Psaila, G., Ceri, S.: A new SQL-like operator for mining association rules. In: Proceedings VLDB, Bombay, India, pp. 122–133 (1996)
Google Scholar
Netz, A., Chaudhuri, S., Fayyad, U., Bernhardt, J.: Integrating data mining with SQL databases: OLE DB for data mining. In: Proceedings ICDE, Heidelberg, Germany, pp. 379–387 (2001)
Google Scholar
Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: Alternatives and implications. Research report rj 10107 (91923), IBM Almaden Research Center, San Jose, California, USA (1998)
Google Scholar
Zaniolo, C.: Extending SQL for decision support applications. In: Presentation slides of keynote address at DMDW, Toronto, Ontario, Canada (2002)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
MATH Google Scholar
Codd, E.: Relational completeness of database sub-languages. In: Rustin, R. (ed.) Courant Computer Science Symposium 6: Database Systems, pp. 65–98. Prentice-Hall, Englewood Cliffs (1972)
Google Scholar
Rantzau, R., Shapiro, L., Mitschang, B., Wang, Q.: Algorithms and applications for universal quantification in relational databases. Information Systems Journal 28, 3–32 (2003)
Article MATH Google Scholar
Graefe, G., Cole, R.: Fast algorithms for universal quantification in large databases. TODS 20, 187–236 (1995)
Article Google Scholar
Melnik, S., Garcia-Molina, H.: Adaptive algorithms for set containment joins. TODS 28 (2003)
Google Scholar
Rantzau, R.: Processing frequent itemset discovery queries by division and set containment join operators. In: Proceedings ACM Workshop DMKD, San Diego, California, USA (2003)
Google Scholar
Rajamani, K., Cox, A., Iyer, B., Chadha, A.: Efficient mining for association rules with relational database systems. In: Proceedings IDEAS, Montreal, Canada, pp. 148–155 (1999)
Google Scholar
Holsheimer, M., Kersten, M., Mannila, H., Toivonen, H.: A perspective on databases and data mining. In: Proceedings KDD, Montreal, Quebec, Canada, pp. 150–155 (1995)
Google Scholar
Houtsma, M., Swami, A.: Set-oriented data mining in relational databases. DKE 17, 245–262 (1995)
Article MATH Google Scholar
Yoshizawa, T., Pramudiono, I., Kitsuregawa, M.: SQL based association rule mining using commercial RDBMS (IBM DB2 UDB EEE). In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, pp. 301–306. Springer, Heidelberg (2000)
Chapter Google Scholar
Pramudiono, I., Shintani, T., Tamura, T., Kitsuregawa, M.: Parallel SQL based association rule mining on large scale PC cluster: Performance comparison with directly coded C implementation. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 94–98. Springer, Heidelberg (1999)
Chapter Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings VLDB, Santiago, Chile, pp. 487–499 (1994)
Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering association rules. In: AAAI Workshop on Knowledge and Discovery in Databases, Seattle, Washington, USA, pp. 181–192 (1994)
Google Scholar
Thomas, S., Chakravarthy, S.: Performance evaluation and optimization of join queries for association rule mining. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 241–250. Springer, Heidelberg (1999)
Google Scholar
Ramakrishnan, R., Gehrke, J.: Database Management Systems, 2nd edn. McGraw-Hill, New York (2000)
MATH Google Scholar
Bercken, J.V.d., Blohsfeld, B., Dittrich, J.P., Krämer, J., Schäfer, T., Schneider, M., Seeger, M.: XXL – A library approach to supporting efficient implementations of advanced database queries. In: Proceedings VLDB, Rome, Italy, pp. 39–48 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Electrical Engineering, and Information Technology, University of Stuttgart, Universitätsstraße 38, 70569, Stuttgart, Germany
Ralf Rantzau

Authors

Ralf Rantzau
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Informatica, Università di Torino, Italy
Rosa Meo
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy
Pier Luca Lanzi
Nokia Research Center, Nokia Group, P.O.Box 407, FIN-00045, Finland
Mika Klemettinen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rantzau, R. (2004). Frequent Itemset Discovery with SQL Using Universal Quantification. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds) Database Support for Data Mining Applications. Lecture Notes in Computer Science(), vol 2682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44497-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-44497-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22479-2
Online ISBN: 978-3-540-44497-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics