Abstract
Algorithms for finding frequent itemsets fall into two broad categories: algorithms that are based on non-trivial SQL statements to query and update a database, and algorithms that employ sophisticated in-memory data structures, where the data is stored in flat files. Most performance experiments have shown that SQL-based approaches are inferior to main-memory algorithms. However, the current trend of database vendors to integrate analysis functionalities into their query execution and optimization components, i.e., “closer to the data,” suggests to revisit these results and to search for new, potentially better solutions.
We investigate approaches based on SQL-92 and present a new approach called Quiver that employs universal and existential quantifications. In the table schema for itemsets of our approach, a group of tuples represents a single itemset. Such a “vertical” layout is similar to the popular layout used for the transaction table, which is the input of frequent itemset discovery. We show that current DBMS do not provide efficient query processing strategies for dealing with quantified queries, mostly due to the lack of an adequate SQL syntax for set containment tests. Performance tests using a query processor prototype and a novel query operator, called set containment division, promise an improved performance for quantified queries like those used for Quiver.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings SIGMOD, Washington DC, USA, pp. 207–216 (1993)
Clear, J., Dunn, D., Harvey, B., Heytens, M., Lohman, P., Mehta, A., Melton, M., Rohrberg, L., Savasere, A., Wehrmeister, R., Xu, M.: Nonstop SQL/MX primitives for knowledge discovery. In: Proceedings KDD, San Diego, California, USA, pp. 425–429 (1999)
Hipp, J., Günzer, U., Grimmer, U.: Integrating association rule mining algorithms with relational database systems. In: Proceedings ICEIS, Setubal, Portugal, pp. 130–137 (2001)
Imielinski, T., Virmani, A.: MSQL: A query language for database mining. DMKD 3, 373–408 (1999)
Han, J., Fu, Y., Koperski, K., Zaiane, O.: DMQL: A data mining query language for relational databases. In: Proceedings DMKD Workshop, Montreal, Canada (1996)
Hyong, T., Indriyati, A., Lup, L.: Towards ad hoc mining of association rules with database management systems. Research report, School of Computing, National University of Singapore (2000)
Wang, H., Zaniolo, C.: Atlas: A native extension of sql for data mining and stream computations. Technical report, Computer Science Department, UCLA, USA (2002)
Meo, R., Psaila, G., Ceri, S.: A new SQL-like operator for mining association rules. In: Proceedings VLDB, Bombay, India, pp. 122–133 (1996)
Netz, A., Chaudhuri, S., Fayyad, U., Bernhardt, J.: Integrating data mining with SQL databases: OLE DB for data mining. In: Proceedings ICDE, Heidelberg, Germany, pp. 379–387 (2001)
Sarawagi, S., Thomas, S., Agrawal, R.: Integrating association rule mining with relational database systems: Alternatives and implications. Research report rj 10107 (91923), IBM Almaden Research Center, San Jose, California, USA (1998)
Zaniolo, C.: Extending SQL for decision support applications. In: Presentation slides of keynote address at DMDW, Toronto, Ontario, Canada (2002)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
Codd, E.: Relational completeness of database sub-languages. In: Rustin, R. (ed.) Courant Computer Science Symposium 6: Database Systems, pp. 65–98. Prentice-Hall, Englewood Cliffs (1972)
Rantzau, R., Shapiro, L., Mitschang, B., Wang, Q.: Algorithms and applications for universal quantification in relational databases. Information Systems Journal 28, 3–32 (2003)
Graefe, G., Cole, R.: Fast algorithms for universal quantification in large databases. TODS 20, 187–236 (1995)
Melnik, S., Garcia-Molina, H.: Adaptive algorithms for set containment joins. TODS 28 (2003)
Rantzau, R.: Processing frequent itemset discovery queries by division and set containment join operators. In: Proceedings ACM Workshop DMKD, San Diego, California, USA (2003)
Rajamani, K., Cox, A., Iyer, B., Chadha, A.: Efficient mining for association rules with relational database systems. In: Proceedings IDEAS, Montreal, Canada, pp. 148–155 (1999)
Holsheimer, M., Kersten, M., Mannila, H., Toivonen, H.: A perspective on databases and data mining. In: Proceedings KDD, Montreal, Quebec, Canada, pp. 150–155 (1995)
Houtsma, M., Swami, A.: Set-oriented data mining in relational databases. DKE 17, 245–262 (1995)
Yoshizawa, T., Pramudiono, I., Kitsuregawa, M.: SQL based association rule mining using commercial RDBMS (IBM DB2 UDB EEE). In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds.) DaWaK 2000. LNCS, vol. 1874, pp. 301–306. Springer, Heidelberg (2000)
Pramudiono, I., Shintani, T., Tamura, T., Kitsuregawa, M.: Parallel SQL based association rule mining on large scale PC cluster: Performance comparison with directly coded C implementation. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 94–98. Springer, Heidelberg (1999)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proceedings VLDB, Santiago, Chile, pp. 487–499 (1994)
Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering association rules. In: AAAI Workshop on Knowledge and Discovery in Databases, Seattle, Washington, USA, pp. 181–192 (1994)
Thomas, S., Chakravarthy, S.: Performance evaluation and optimization of join queries for association rule mining. In: Mohania, M., Tjoa, A.M. (eds.) DaWaK 1999. LNCS, vol. 1676, pp. 241–250. Springer, Heidelberg (1999)
Ramakrishnan, R., Gehrke, J.: Database Management Systems, 2nd edn. McGraw-Hill, New York (2000)
Bercken, J.V.d., Blohsfeld, B., Dittrich, J.P., Krämer, J., Schäfer, T., Schneider, M., Seeger, M.: XXL – A library approach to supporting efficient implementations of advanced database queries. In: Proceedings VLDB, Rome, Italy, pp. 39–48 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Rantzau, R. (2004). Frequent Itemset Discovery with SQL Using Universal Quantification. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds) Database Support for Data Mining Applications. Lecture Notes in Computer Science(), vol 2682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44497-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-44497-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22479-2
Online ISBN: 978-3-540-44497-8
eBook Packages: Springer Book Archive