Cobweb/IDX: Mapping Cobweb to SQL

Lepinioti, Konstantina; McKearney, Stephen

doi:10.1007/978-0-387-74935-8_26

Konstantina Lepinioti⁴ &
Stephen McKearney⁴

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 6))

794 Accesses

Data-mining algorithms are used in many applications to help extract meaningful data from very large datasets. For example, the NetFlix [12] Web site uses hundreds of thousands of past movie ratings stored in an Oracle database to propose movies to returning customers.

Existing data-mining algorithms extract data from databases before processing them but this requires a lot of time and expertise from database administrators. One method of simplifying this process is to develop the algorithms as part of the database management system (DBMS) and to make them accessible using standard database querying tools. However, there are many challenges to be overcome before data mining can be performed using off-the-shelf query tools. One challenge is to make the process of asking a question and interpreting the results as simple as querying a database table. A second challenge is to develop data-mining algorithms that use the database efficiently because database access can have major performance implications.

This chapter suggests one solution to the challenge of making the data-mining process simpler. It discusses an implementation of a popular conceptual clustering algorithm, Cobweb [4], as an add-on to a DBMS. We call our implementation Cobweb/IDX. Section 26.2 is a discussion of the Cobweb algorithm. Section 26.3 discusses the motivation for choosing Cobweb as the basis for our work. Section 26.4 discusses Cobweb/IDX and how it maps the Cobweb algorithm to SQL. Section 26.5 talks about the advantages and disadvantages of the Cobweb/IDX implementation. Section 26.6 presents other work on integrating data mining with databases and finally Sect. 26.7 contains a summary and directions for future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Biswas G, Weinberg JB, Fisher DH (1998) ITERATE: A conceptual clustering algorithm for data mining. IEEE, Transactions on Systems, Man, Cybernetics - Part C: Applications and Reviews, 28(2), 219–229.
Article Google Scholar
Clear J, Dunn D, Harvey B, Heytens ML, Lohman P, Mehta A, Melton M, Rohrberg L, Savasere A, Wehrmeister RM, Xu M (1999) Nonstop SQL/MX primitives for knowledge discovery. Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD-99), San Diego.
Google Scholar
Clementine, Data Mining, Clementine, Predictive Modeling, Predictive Analytics. http://www.spss.com/clementine/, accessed on July 2006.
Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Machine Learning, (2), 139–172.
Google Scholar
Geist I, Sattler K (2002) Towards data mining operators in database systems: Algebra and implementation. Proceedings of 2nd International Workshop on Databases, Documents, and Information Fusion (DBFusion 2002), Karlsruhe.
Google Scholar
Gennari JH, Langley P, Fisher D (1990) Models of incremental concept formation. In J. Corbonell (ed.), Machine Learning: Paradigms and Methods, MIT Press/Elsevier.
Google Scholar
Gluck MA, Corter JE (1985) Information, uncertainty, and the utility of categories. Proceedings of 7th Annual Conference of the Cognitive Science Society, 283–287.
Google Scholar
Hammouda K (2002) Data mining using conceptual clustering. International Conference on Data Mining (ICDM).
Google Scholar
Knuth D (1997) The Art of Computer Programming, Volume 3: Sorting and Searching. Third Edition, Addison-Wesley.
Google Scholar
Liu H, Lu H, Chen J (2002) A fast scalable classifier tightly integrated with RDBMS. Journal of Computer Science and Technology, 17(2), 152–159.
Article MATH Google Scholar
McKusick K, Thompson K (1990) COBWEB/3: A portable implementation. NASA Ames Research Center, Artificial Intelligence Research Branch, Technical Report FIA-90-6-18-2, June 20.
Google Scholar
Netflix, www.netflix.com, 2006.
Oracle, www.oracle.com, 2006.
Ordonez C (2006) Integrating K-means clustering with a relational DBMS using SQL. IEEE Transactions on Knowledge and Data Engineering, 18(2), 188–201.
Article MathSciNet Google Scholar
Sattler K, Dunemann O (2001) SQL database primitives for decision tree classifiers. Proceedings of the 10th ACM CIKM International Conference on Information and Knowledge Management, November 5–10, Atlanta, GA.
Google Scholar
Sarawagi S, Thomas S, Agrawal R (1998) Integrating mining with relational database systems: Alternatives and implications. SIGMOD Conference, 343–354.
Google Scholar
Shannon CE, Weaver W (1949) The Mathematical Theory of Communication, University of Illiniois Press.
Google Scholar
Sousa MS, Mattoso MLQ, Ebecken NFF (1998) Data mining: A database perspective. Proceedings, International Conference on Data Mining, WIT Press, Rio de Janeiro, Brasil, September, 413–432.
Google Scholar
Theodorakis M, Vlachos A, Kalamboukis TZ (2004) Using hierarchical clustering to enhance classification accuracy. Proceedings of 3rd Hellenic Conference in Artificial Intelligence, Samos, May.
Google Scholar
Wang M, Iyer B, Vitter JS (1998) Scalable mining for classification rules in relational databases. Proceedings, International Database Engineering & Application Symposium, Cardiff, UK, July 8–10, 58–67.
Google Scholar
Zloof M (1975) Query by Example. AFIPS, 44.
Google Scholar
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: An efficient data clustering method for very large databases. Proceedings — ACM — SIGMOD International Conference on Management of Data, Montreal, 103–114.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Design, Engineering and Computing, Bournemouth University, UK
Konstantina Lepinioti & Stephen McKearney

Authors

Konstantina Lepinioti
View author publications
You can also search for this author in PubMed Google Scholar
Stephen McKearney
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Tijuana Institute of Technology, 4207, Chula Vista, CA, 91909, USA
Oscar Castillo
Department of Systems Science and Engineering Yu-Quan Campus, Zhejiang University College of Electrical Engineering, 310027, Hangzhou, People's Republic of China
Li Xu
IAENG Secretariat, 37–39 Hung To Road Unit 1, 1/F, Hong Kong, People's Republic of China
Sio-Iong Ao

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lepinioti, K., McKearney, S. (2008). Cobweb/IDX: Mapping Cobweb to SQL. In: Castillo, O., Xu, L., Ao, SI. (eds) Trends in Intelligent Systems and Computer Engineering. Lecture Notes in Electrical Engineering, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-74935-8_26

Download citation

DOI: https://doi.org/10.1007/978-0-387-74935-8_26
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-74934-1
Online ISBN: 978-0-387-74935-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics