From the Art of KDD to the Science of KDD

  • Y. Kodratoff
Conference paper
Part of the International Centre for Mechanical Sciences book series (CISM, volume 382)


It has been already largely proven that Knowledge Discovery in Databases (KDD) is an interesting new research field, able to provide financial returns to the companies that are willing to invest into it. This fact demonstrates the excellent social value of KDD. A Science, however, is not uniquely defined by this feature. It needs also to show an internal logic, due to a specific approach to the real-life problems it deals with. This last point of view has been less emphasized in the existing KDD literature. This paper attempts, without any pretense to be exhaustive, to start filling up this gap. We shall explain why KDD is not just “a bunch of techniques” but a real Science, certainly one still under organization, but which shows the strong inner motivation that other Sciences do. In conclusion we shall give a compact definition of KDD, and show what is the concept it provides measurement of, as a function of which other concepts.


Data Mining Knowledge Discovery Knowledge Acquisition Unsupervised Learning Inductive Logic Programming 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Frawley W., Piatetsky-Shapiro G, Matheus C.: Knowledge Discovery in Databases: An Overview, AI Magazine, Fall 1992. Reprint of the introductory chapter of Knowledge Discovery in Databases collection, AAAI/MIT Press, (1991).Google Scholar
  2. 2.
    Kodratoff Y.: Foreword of the guest editor: The Comprehensibility Manifesto, AI Communications, 7, (1994).Google Scholar
  3. 3.
    Fayyad U., Piatetsky-Shapiro G., Smyth P.: Knowledge Discovery and Data Mining: Towards a Unifying Framework, Proc. 2nd International Conference on KDD, DM, Simoudis E. and Han J. (Eds.), AAAI Press, Menlo Park CA, (1996), 82–87.Google Scholar
  4. 4.
    Cheeseman P., Stutz J.: Bayesian Classification (AutoClass): Theory and Results, in: Advances in Knowledge Discovery and Data Mining, U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (Eds.), The AAAI Press, Menlo Park, 1996.Google Scholar
  5. 5.
    Turing A. M.: Computing Machinery and Intelligence“, Mind 59 (1950), 433–460.CrossRefMathSciNetGoogle Scholar
  6. 6.
    Searle J. R.: Minds, brains, science, Penguin books, London 1984.Google Scholar
  7. 7.
    Searle J. R., Scientific American 262 (1990), 26–31.CrossRefGoogle Scholar
  8. 8.
    Draganescu M.: L’Universalite’ ontologique de l’information (Ontological Universality of Information), preface, notes by Y. Kodratoff, Bucharest, Editura Academiei, 1996 (also available at (In French). See also Draganescu M.: Information, Heuristics, Creation, in: Artificial Intelligence and Information, Control Systems of Robots, Plander I. (Ed.), Elsevier, Amsterdam 1984, 25–29.
  9. 9.
    Brachman, R.J., Anand, T.: The Process of Knowledge Discovery in Databases: A First Sketch, in: Proc. KDD’94, Séattle, (1994), 1–11.Google Scholar
  10. 10.
    Wirth R., Reinartz T. P.: Detecting Early Indicator Cars in Automotive Database: A Multi-Strategy Approach, in: Proc. 2nd International Conference on KDD, DM, Simoudis E. and Han J. (Eds.), AAAI Press, Menlo Park CA, 1996, 76–81.Google Scholar
  11. 11.
    Lindner G., Morik K.: Coupling a relational learning algorithm with a DB system, ECML workshop on ML, Statistics, and KDD, Heraklion, 1995.Google Scholar
  12. 11.
    Riddle P., Segal R., Etzioni O.: Representation design and brute-force induction in a Boeing manufacturing domain, Applied Artificial Intelligence 8 (1994), 125–148.CrossRefGoogle Scholar
  13. 12.
    Kodratoff Y., Vrain C.: Acquiring first-order knowledge about air-traffic control, Knowledge Acquisition, 5 (1993), 1–36.CrossRefGoogle Scholar
  14. 13.
    Michalski, R.S.: A Theory and Methodology of Inductive Learning, in: Machine Learning: An Artificial Intelligence Approach, Vol. 1, R.S. Michalski, J.G. Carbonell, T.M. Mitchell (Eds.), Morgan Kaufman, Menlo Park CA 1983, 83–134.CrossRefGoogle Scholar
  15. 14.
    Benzecri J. P. (with many co-authors) L’analyse des données, Dunod, Paris, 1973. (In French)Google Scholar
  16. 15.
    Fisher D.: Knowledge acquisition via incremental conceptual clustering, Machine Learning 2 (1987), 139–172.Google Scholar
  17. 16.
    Gennari J. H., Langley P., Fisher D.: Models of incremental concept formation, Artificial Intelligence 40 (1989), 11–61.CrossRefGoogle Scholar
  18. 17.
    McKusick, K., Thompson, K.: Cobweb/3: A portable implementation (Technical Report FIA–90–6–18–2). Moffett Field, CA.: NASA Ames Research Center, Artificial Intelligence Research Branch, 1990.Google Scholar
  19. 18.
    Ketterlin A., Korczak J. J.: Concept formation in complex domains, in: Proc. ECML, Vol. 784 Springer-Verlag’s LNCS, Berlin 1994.Google Scholar
  20. 19.
    Ketterlin A., Gançarski P., Korczak J. J.: Conceptual clustering in structured databases: a practical approach, in: Fayyad U. M., Uthurusamy (Eds.) Proc. KDD’95, Montreal, AAAI/MIT Press 1995.Google Scholar
  21. 20.
    Quinlan J. R.: C4.5: Programs in Machine Learning. Morgan Kaufmann, Menlo Park CA 1992.Google Scholar
  22. 21.
    Augier S., Venturini G., Kodratoff Y.: Learning first order logic rules with a genetic algorithm, in: Fayyad U. M., (Eds.) Proc. KDD’95, Montreal, AAAI/MIT Press 1995.Google Scholar
  23. 22.
    Bisson G.: Learning in FOL with a similarity measure, in: Proceedings of AAAI, San Jose, California, 13–17 July 1992.Google Scholar
  24. 23.
    Kodratoff Y, Bisson G.: The epistemology of conceptual clustering: KBG, an implementation“, Journal of Intelligent Information System, 1 (1992) 57–84.CrossRefGoogle Scholar
  25. 24.
    Breiman L., Friedman J., Olshen R., Stone C.: Classification and Regression Trees, Wadsworth International Group 1984.Google Scholar
  26. 25.
    Koza J.: Genetic Programming: On the Programming of Computers by Means of Natural Selection, The MIT Press 1992.Google Scholar
  27. 26.
    Diday E.: The dynamic clusters method in non hierarchical clustering, International Journal of Computer Sciences, 2 1973.Google Scholar
  28. 27.
    Michalski R. S., Diday E., Stepp R. E: A recent advance in Data Analysis: Clustering objects into classes characterized by conjunctive concepts, in Progress in Pattern Recognition, Kanal and Rosenfeld (Eds), 1982.Google Scholar
  29. 28.
    Giordana A., Saitta L. and Zini F.: Learning disjunctive concepts by means of genetic algorithms, in: Proc. 11th International Conference on Machine Learning, 1994, 96–104.Google Scholar
  30. 29.
    Piatetsky-Shapiro, G. Matheus, C.J.: The interestingness of deviations, in. Proc. KDD 94, 1994, 25–36.Google Scholar
  31. 30.
    Matheus, C.J., Piatetsky-Shapiro, G., McNeill, D.: An Application of KeFiR to the Analysis of Healthcare Information, in Proc. KDD 94, pp. 441–452, 1994.Google Scholar
  32. 31.
    Spirtes P., Glymour G., Scheines R.: Causation, prediction and search, in: Lectures Notes in Statistics-81, Springer-Verlag, Berlin 1993.Google Scholar
  33. 32.
    Findler N. V., Bickmore T.: On the concept of causality and a causal modeling system for scientific and engineering domain, Applied Artificial Intelligence 10 (1996), 455–487.CrossRefGoogle Scholar
  34. 33.
    Kodratoff Y.: Induction and the Organization of Knowledge, in: Machine Learning: A Multistrategy Approach, Tecuci G., Michalski R. S. (Eds.), pages 85–106. Morgan-Kaufmann, San Francisco CA, 1994.Google Scholar
  35. 34.
    Esposito F., Malerba D., Ripa V., Semeraro G.: Discovering Causal Rules in Relational Databases, in: Cybernetics and Systems’96, R. Trappl (Ed.), Austrian Soc. for Cyber. Studies, Vienna, Austria 1996, 943–948.Google Scholar
  36. 35.
    Pavillon G.: ARC II: a System for Inducing and Simplifying Dependence and Causal Relationships, in: Cybernetics and Systems’96, R. Trappl (Ed.), Austrian Soc. for Cyber. Studies, Vienna, Austria 1996, 985–990.Google Scholar
  37. 36.
    Pearl J., Verma T. S., A theory of inferred causation, in: Allen J. A., Fikes R., Sandewall E. (Eds.), Principles of Knowledge Representation and Reasoning, Morgan Kaufmann, San Mateo CA 1991, 441–452.Google Scholar
  38. 37.
    Mannila H., Toivonen H., Verkamo A. I.: Efficient algorithms for discovering association rules, in: Proc. KDD’94, pp. 181–192, Seattle, July 1994.Google Scholar
  39. 38.
    Siegel M. D., Sciore M., Salveter S.: A method for automatic rule derivation to support semantic query optimisation, ACM Trans. on DBS, 17 (1992), 530–600.MathSciNetGoogle Scholar
  40. 39.
    Hsu C., Knoblock C. A.: Rule induction for semantic query optimisation, Proc. 11th IMLC, 1994.Google Scholar
  41. 40.
    Sayli A., Lowden B.: The use of statistics in semantic query optimisation, in: Cybernetics and Systems’96, R. Trappl (Ed.), Austrian Soc. for Cyber. Studies, Vienna, Austria 1996, 991–996.Google Scholar
  42. 41.
    Han J., Fu Y., Wang W., Chiang J., Gong W., Koperski K., Li D., Lu Y., Rajan A., Stefanovic N., Xia B., Zaiane O. R.: DBMiner: A System for Mining Knowledge in Large Relational Databases, in: Proc. 2nd International Conference on KDD, DM, Simoudis E. and Han J. (Eds.), AAAI Press, Menlo Park CA 1996.Google Scholar
  43. 42.
    Imielinski T., Virmani A., Abdulghani A.: DataMine: Application Programming Interface and Query Language for Data Mining, in: Proc. 2nd International Conference on KDD, DM, Simoudis E. and Han J. (Eds.), AAAI Press, Menlo Park CA 1996, 256–261.Google Scholar
  44. 43.
    Han J., Fu Y., Wang W., Koperski K., Zaiane O.: DMQL: A Data Mining Query language for Relational Databases, in: SIGMOD’96 Workshop. on Research Issues on Data Mining and Knowledge Discovery (DMKD’96), Montreal, Canada, June 1996.Google Scholar
  45. 44.
    Bhandari, I.: Attribute focusing: Machine-Assisted Knowledge discovery Applied to Software Production Process Control, Knowledge Acquisition 6 (1994), 271–294.CrossRefGoogle Scholar
  46. 45.
    Mannila H., Toivonen H.: On an algorithm for finding all interesting sentences, in: Cybernetics and Systems’96, R. Trappl (Ed.), Austrian Soc. for Cyber. Studies, Vienna, Austria 1996, 973–978.Google Scholar
  47. 46.
    Ralambondrainy H.: An interactive system of classification: SICLA, in: H. Caussinus et al. (Eds.), Proceedings in Computational Statistics, COMPSTAT 82, Physica-Verlag, Wien 1982, 225–225.Google Scholar
  48. 47.
    Feigenbaum E. A.: The simulation of verbal learning behavior, in: Computers and Thought, Feigenbaum E. A, Feldman J. (Eds.), McGraw-Hill, N.Y. 1963.Google Scholar
  49. 48.
    Lebowitz M.: Experiments with incremental concept formation/ UNIMEM, Machine Learning 2 (1987), 103–138.Google Scholar
  50. 49.
    Morik, K., Wrobel, S., Kietz, J. U. and Emde, W.: Knowledge Acquisition and Machine Learning - Theory, Methods, and Applications. Academic Press, London 1993.Google Scholar
  51. 50.
    Utgoff P., Brodley C.: An incremental method for finding multivariate splits for decision trees, in: Proc. of the Seventh International Conference on Machine Learning (ICML-90), Morgan Kaufmann, Los Altos, CA 1990, 58–65.Google Scholar
  52. 51.
    Heath D, Kasif S., Salzberg S.: Learning oblique decision trees, in: Proc. of the 13th International Joint Conference on Artificial Intelligence, Morgan Kaufmann 1993, 10021007.Google Scholar
  53. 52.
    Murthy S., Kasif S., Salzberg S., Beigel R.: °C1: Randomized induction of oblique decision trees, in: Proc. of the Eleventh National Conference on Artificial Intelligence, MIT Press, Washington D.C. 1993, 322–327.Google Scholar
  54. 53.
    Murthy S., Kasif S., Salzberg S.: A System for Induction of Oblique Decision Trees, Journal of Artificial Intelligence Research 2 (1994), 1–32.zbMATHGoogle Scholar
  55. 54.
    Brodley C., Utgoff P.: Multivariate decision trees, Machine Learning 19 (1995), 4577.Google Scholar
  56. 55.
    Brunie-Taton A., Cornuéjols A.: Classification en Programmation génétique, in: Proc. of the 11th Journées Françaises d’Apprentissage (JFA-96), Sète, France, May 8–10, 1996. (In French)Google Scholar
  57. 56.
    Giordana A., Saitta L., Regal: an integrated system for learning relations using genetic algorithms, in: Proceedings of the Second International Workshop on Multistrategy Learning, R.S. Michalski and G. Tecuci (Eds.) 1993, 234–249.Google Scholar
  58. 57.
    Piatetsky-Shapiro G., Brachman R., Khabaza T., Kloesgen W., Simoudis E.: An Overview of Issues in Developing Industrial Data Mining and Knowledge Discovery Applications, in: Proc. 2nd International Conference on KDD, DM, Simoudis E. and Han J. (Eds.), AAAI Press, Menlo Park CA 1996, 89–95.Google Scholar
  59. 58.
    Fayyad U., Haussler D., Storloz P. “KDD for Science Data Analysis: Issues and Examples,” Proc. 2nd International Conference on KDD, DM, Simoudis E. and Han J. (Eds.), AAAI Press, Menlo Park CA 1996, 50–56.Google Scholar
  60. 59.
    Kodratoff Y., “Is AI a sub-field of Computer Science or AI is the Science of Explanations”, in: Progress in Machine Learning, I. Bratko, N. Lavrac (Eds.), Sigma Press, Wilmslow 1987, 91–105.Google Scholar

Copyright information

© Springer-Verlag Wien 1997

Authors and Affiliations

  • Y. Kodratoff
    • 1
  1. 1.University of Paris-SudOrsayFrance

Personalised recommendations