Skip to main content

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 6))

  • 794 Accesses

Data-mining algorithms are used in many applications to help extract meaningful data from very large datasets. For example, the NetFlix [12] Web site uses hundreds of thousands of past movie ratings stored in an Oracle database to propose movies to returning customers.

Existing data-mining algorithms extract data from databases before processing them but this requires a lot of time and expertise from database administrators. One method of simplifying this process is to develop the algorithms as part of the database management system (DBMS) and to make them accessible using standard database querying tools. However, there are many challenges to be overcome before data mining can be performed using off-the-shelf query tools. One challenge is to make the process of asking a question and interpreting the results as simple as querying a database table. A second challenge is to develop data-mining algorithms that use the database efficiently because database access can have major performance implications.

This chapter suggests one solution to the challenge of making the data-mining process simpler. It discusses an implementation of a popular conceptual clustering algorithm, Cobweb [4], as an add-on to a DBMS. We call our implementation Cobweb/IDX. Section 26.2 is a discussion of the Cobweb algorithm. Section 26.3 discusses the motivation for choosing Cobweb as the basis for our work. Section 26.4 discusses Cobweb/IDX and how it maps the Cobweb algorithm to SQL. Section 26.5 talks about the advantages and disadvantages of the Cobweb/IDX implementation. Section 26.6 presents other work on integrating data mining with databases and finally Sect. 26.7 contains a summary and directions for future work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Biswas G, Weinberg JB, Fisher DH (1998) ITERATE: A conceptual clustering algorithm for data mining. IEEE, Transactions on Systems, Man, Cybernetics - Part C: Applications and Reviews, 28(2), 219–229.

    Article  Google Scholar 

  2. Clear J, Dunn D, Harvey B, Heytens ML, Lohman P, Mehta A, Melton M, Rohrberg L, Savasere A, Wehrmeister RM, Xu M (1999) Nonstop SQL/MX primitives for knowledge discovery. Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD-99), San Diego.

    Google Scholar 

  3. Clementine, Data Mining, Clementine, Predictive Modeling, Predictive Analytics. http://www.spss.com/clementine/, accessed on July 2006.

  4. Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Machine Learning, (2), 139–172.

    Google Scholar 

  5. Geist I, Sattler K (2002) Towards data mining operators in database systems: Algebra and implementation. Proceedings of 2nd International Workshop on Databases, Documents, and Information Fusion (DBFusion 2002), Karlsruhe.

    Google Scholar 

  6. Gennari JH, Langley P, Fisher D (1990) Models of incremental concept formation. In J. Corbonell (ed.), Machine Learning: Paradigms and Methods, MIT Press/Elsevier.

    Google Scholar 

  7. Gluck MA, Corter JE (1985) Information, uncertainty, and the utility of categories. Proceedings of 7th Annual Conference of the Cognitive Science Society, 283–287.

    Google Scholar 

  8. Hammouda K (2002) Data mining using conceptual clustering. International Conference on Data Mining (ICDM).

    Google Scholar 

  9. Knuth D (1997) The Art of Computer Programming, Volume 3: Sorting and Searching. Third Edition, Addison-Wesley.

    Google Scholar 

  10. Liu H, Lu H, Chen J (2002) A fast scalable classifier tightly integrated with RDBMS. Journal of Computer Science and Technology, 17(2), 152–159.

    Article  MATH  Google Scholar 

  11. McKusick K, Thompson K (1990) COBWEB/3: A portable implementation. NASA Ames Research Center, Artificial Intelligence Research Branch, Technical Report FIA-90-6-18-2, June 20.

    Google Scholar 

  12. Netflix, www.netflix.com, 2006.

  13. Oracle, www.oracle.com, 2006.

  14. Ordonez C (2006) Integrating K-means clustering with a relational DBMS using SQL. IEEE Transactions on Knowledge and Data Engineering, 18(2), 188–201.

    Article  MathSciNet  Google Scholar 

  15. Sattler K, Dunemann O (2001) SQL database primitives for decision tree classifiers. Proceedings of the 10th ACM CIKM International Conference on Information and Knowledge Management, November 5–10, Atlanta, GA.

    Google Scholar 

  16. Sarawagi S, Thomas S, Agrawal R (1998) Integrating mining with relational database systems: Alternatives and implications. SIGMOD Conference, 343–354.

    Google Scholar 

  17. Shannon CE, Weaver W (1949) The Mathematical Theory of Communication, University of Illiniois Press.

    Google Scholar 

  18. Sousa MS, Mattoso MLQ, Ebecken NFF (1998) Data mining: A database perspective. Proceedings, International Conference on Data Mining, WIT Press, Rio de Janeiro, Brasil, September, 413–432.

    Google Scholar 

  19. Theodorakis M, Vlachos A, Kalamboukis TZ (2004) Using hierarchical clustering to enhance classification accuracy. Proceedings of 3rd Hellenic Conference in Artificial Intelligence, Samos, May.

    Google Scholar 

  20. Wang M, Iyer B, Vitter JS (1998) Scalable mining for classification rules in relational databases. Proceedings, International Database Engineering & Application Symposium, Cardiff, UK, July 8–10, 58–67.

    Google Scholar 

  21. Zloof M (1975) Query by Example. AFIPS, 44.

    Google Scholar 

  22. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: An efficient data clustering method for very large databases. Proceedings — ACM — SIGMOD International Conference on Management of Data, Montreal, 103–114.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Lepinioti, K., McKearney, S. (2008). Cobweb/IDX: Mapping Cobweb to SQL. In: Castillo, O., Xu, L., Ao, SI. (eds) Trends in Intelligent Systems and Computer Engineering. Lecture Notes in Electrical Engineering, vol 6. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-74935-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-74935-8_26

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-74934-1

  • Online ISBN: 978-0-387-74935-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics