Skip to main content

An Abstract Algebra for Knowledge Discovery in Databases

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3255))

Abstract

Knowledge discovery in databases (KDD) plays an important role in decision-making tasks by supporting end users both in exploring and understanding of very large datasets and in building predictive models with validity over unseen data. KDD is an ad-hoc, iterative process comprising tasks that range from data understanding and preparation to model building and deployment. Support for KDD should, therefore, be founded on a closure property, i.e., the ability to compose tasks seamlessly by taking the output of a task as the input of another. Despite some recent progress, KDD is still not as conveniently supported as end users have reason to expect due to three major problems: (1) lack of task compositionality, (2) undue dependency on user expertise, and (3) lack of generality. This paper contributes to ameliorate these problems by proposing an abstract algebra for KDD, called K-algebra, whose underlying data model and primitive operations accommodate a wide range of KDD tasks. Such an algebra is a necessary step towards the development of optimisation techniques and efficient evaluation that would, in turn, pave the way for the development of declarative, surface KDD languages without which end-user support will remain less than convenient, thereby damaging the prospects for mainstream acceptance of KDD technology.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Breiman, L., Friedman, J.H., Olshen, R., Stone, C.J.: Classification and Regression Trees. CRC Press, Boca Raton (1984)

    MATH  Google Scholar 

  2. Corporation, M.: Ole db for data mining specification, v1.0 (July 2000)

    Google Scholar 

  3. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discovery: An Overview. In: Advances in Knowledge Discovery and Data Mining, pp. 1–34. AAAI/MIT Press (1996)

    Google Scholar 

  4. Fegaras, L., Maier, D.: Optimizing Object Queries Using an Effective Calculus. ACM Transactions on Database Systems 25(4) (December 2000)

    Google Scholar 

  5. Garcia-Molina, H., Ullman, J.D., Widom, J.: Database System Implementation. Prentice-Hall, Englewood Cliffs (2000)

    Google Scholar 

  6. Geist, I., Sattler, K.: Towards Data Mining Operators in Database Systems: Algebra and Implementation. In: Proc. of 2nd Int. Workshop on Databases, Documents, and Information Fusion (2002)

    Google Scholar 

  7. Gerber, L., Fernandes, A.A.A.: The K-Algebra for Knowledge Discovery in Databases. Technical report, University of Manchester, Department of Computer Science (March 2004)

    Google Scholar 

  8. Güting, R.H., Schneider, M.: Realm-Based Spatial Data Types: The ROSE Algebra. VLDB Journal 4(2), 243–286 (1995)

    Article  Google Scholar 

  9. Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: DMQL: A Data Mining Query Language for Relational Databases. In: SIGMOD 1996 Workshop on Research Issues on Data Mining and Knowledge Discovery (1996)

    Google Scholar 

  10. Imielinski, T., Virmani, A.: MSQL: A Query Language for Database Mining. Data Mining and Knowledge Discovery 3(4), 373–408 (1999)

    Article  Google Scholar 

  11. Jagadish, H.V., Lakshmanan, L.V.S., Srivastava, D., Thompson, K.: TAX: A Tree Algebra for XML. In Database Programming Languages. In: Ghelli, G., Grahne, G. (eds.) DBPL 2001. LNCS, vol. 2397, pp. 149–164. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  12. Johnson, T., Lakshmanan, L.V.S., Ng, R.T.: The 3W Model and Algebra for Unified Data Mining. In: Proceedings of 26th International Conference on Very Large Data Bases, pp. 21–32 (2000)

    Google Scholar 

  13. Meo, R., Psaila, G., Ceri, S.: An Extension to SQL for Mining Assocation Rules. Knowledge Discovery and Association Rules 2(2), 195–224 (1998)

    Article  Google Scholar 

  14. Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)

    Google Scholar 

  15. Sarawagi, S., Thomas, S., Agrawal, R.: Integrating Mining with Relational Database Systems: Alternatives and Implications. Technical Report RJ 10107 (91923), IBM Almaden Research Center, San Jose, California (March 1998)

    Google Scholar 

  16. Tucker, P.A., Maier, D., Sheard, T., Fegaras, L.: Exploiting Punctuation Semantics in Continuous Data Streams. IEEE Transactions on Knowledge and Data Engineering 15(3), 555–568 (2003)

    Article  Google Scholar 

  17. Wang, H., Zaniolo, C.: ATLaS: A Native Extension of SQL for Data Mining. In: Proceedings of the Third SIAM International Conference on Data Mining (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gerber, L., Fernandes, A.A.A. (2004). An Abstract Algebra for Knowledge Discovery in Databases. In: Benczúr, A., Demetrovics, J., Gottlob, G. (eds) Advances in Databases and Information Systems. ADBIS 2004. Lecture Notes in Computer Science, vol 3255. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30204-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30204-9_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23243-8

  • Online ISBN: 978-3-540-30204-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics