Data-mining algorithms are used in many applications to help extract meaningful data from very large datasets. For example, the NetFlix [12] Web site uses hundreds of thousands of past movie ratings stored in an Oracle database to propose movies to returning customers.
Existing data-mining algorithms extract data from databases before processing them but this requires a lot of time and expertise from database administrators. One method of simplifying this process is to develop the algorithms as part of the database management system (DBMS) and to make them accessible using standard database querying tools. However, there are many challenges to be overcome before data mining can be performed using off-the-shelf query tools. One challenge is to make the process of asking a question and interpreting the results as simple as querying a database table. A second challenge is to develop data-mining algorithms that use the database efficiently because database access can have major performance implications.
This chapter suggests one solution to the challenge of making the data-mining process simpler. It discusses an implementation of a popular conceptual clustering algorithm, Cobweb [4], as an add-on to a DBMS. We call our implementation Cobweb/IDX. Section 26.2 is a discussion of the Cobweb algorithm. Section 26.3 discusses the motivation for choosing Cobweb as the basis for our work. Section 26.4 discusses Cobweb/IDX and how it maps the Cobweb algorithm to SQL. Section 26.5 talks about the advantages and disadvantages of the Cobweb/IDX implementation. Section 26.6 presents other work on integrating data mining with databases and finally Sect. 26.7 contains a summary and directions for future work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Biswas G, Weinberg JB, Fisher DH (1998) ITERATE: A conceptual clustering algorithm for data mining. IEEE, Transactions on Systems, Man, Cybernetics - Part C: Applications and Reviews, 28(2), 219–229.
Clear J, Dunn D, Harvey B, Heytens ML, Lohman P, Mehta A, Melton M, Rohrberg L, Savasere A, Wehrmeister RM, Xu M (1999) Nonstop SQL/MX primitives for knowledge discovery. Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD-99), San Diego.
Clementine, Data Mining, Clementine, Predictive Modeling, Predictive Analytics. http://www.spss.com/clementine/, accessed on July 2006.
Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Machine Learning, (2), 139–172.
Geist I, Sattler K (2002) Towards data mining operators in database systems: Algebra and implementation. Proceedings of 2nd International Workshop on Databases, Documents, and Information Fusion (DBFusion 2002), Karlsruhe.
Gennari JH, Langley P, Fisher D (1990) Models of incremental concept formation. In J. Corbonell (ed.), Machine Learning: Paradigms and Methods, MIT Press/Elsevier.
Gluck MA, Corter JE (1985) Information, uncertainty, and the utility of categories. Proceedings of 7th Annual Conference of the Cognitive Science Society, 283–287.
Hammouda K (2002) Data mining using conceptual clustering. International Conference on Data Mining (ICDM).
Knuth D (1997) The Art of Computer Programming, Volume 3: Sorting and Searching. Third Edition, Addison-Wesley.
Liu H, Lu H, Chen J (2002) A fast scalable classifier tightly integrated with RDBMS. Journal of Computer Science and Technology, 17(2), 152–159.
McKusick K, Thompson K (1990) COBWEB/3: A portable implementation. NASA Ames Research Center, Artificial Intelligence Research Branch, Technical Report FIA-90-6-18-2, June 20.
Netflix, www.netflix.com, 2006.
Oracle, www.oracle.com, 2006.
Ordonez C (2006) Integrating K-means clustering with a relational DBMS using SQL. IEEE Transactions on Knowledge and Data Engineering, 18(2), 188–201.
Sattler K, Dunemann O (2001) SQL database primitives for decision tree classifiers. Proceedings of the 10th ACM CIKM International Conference on Information and Knowledge Management, November 5–10, Atlanta, GA.
Sarawagi S, Thomas S, Agrawal R (1998) Integrating mining with relational database systems: Alternatives and implications. SIGMOD Conference, 343–354.
Shannon CE, Weaver W (1949) The Mathematical Theory of Communication, University of Illiniois Press.
Sousa MS, Mattoso MLQ, Ebecken NFF (1998) Data mining: A database perspective. Proceedings, International Conference on Data Mining, WIT Press, Rio de Janeiro, Brasil, September, 413–432.
Theodorakis M, Vlachos A, Kalamboukis TZ (2004) Using hierarchical clustering to enhance classification accuracy. Proceedings of 3rd Hellenic Conference in Artificial Intelligence, Samos, May.
Wang M, Iyer B, Vitter JS (1998) Scalable mining for classification rules in relational databases. Proceedings, International Database Engineering & Application Symposium, Cardiff, UK, July 8–10, 58–67.
Zloof M (1975) Query by Example. AFIPS, 44.
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: An efficient data clustering method for very large databases. Proceedings — ACM — SIGMOD International Conference on Management of Data, Montreal, 103–114.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Lepinioti, K., McKearney, S. (2008). Cobweb/IDX: Mapping Cobweb to SQL. In: Castillo, O., Xu, L., Ao, SI. (eds) Trends in Intelligent Systems and Computer Engineering. Lecture Notes in Electrical Engineering, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-74935-8_26
Download citation
DOI: https://doi.org/10.1007/978-0-387-74935-8_26
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-74934-1
Online ISBN: 978-0-387-74935-8
eBook Packages: EngineeringEngineering (R0)