Advertisement

Journal of Intelligent Information Systems

, Volume 4, Issue 1, pp 53–69 | Cite as

Efficient discovery of interesting statements in databases

  • Willi Klösgen
Article

Abstract

The Explora system supportsDiscovery in Databases by large scale search for interesting instances of statistical patterns. In this paper we describe how Explora assessesinterestingness and achievescomputational efficiency. These problems arise because of the variety of patterns and the immense combinatorial possibilities of generating instances when studying relations between variables in subsets of data. First, the user must be saved from getting overwhelmed with a deluge of findings. To restrict the search with respect to the analysis goals, the user can focus each discovery task performed during an interactive and iterative exploration process. Some basic organization principles of search can further limit the search effort. One principle is to organize search hierarchically and to evaluate first the statistical or information theoretic evidence of the general hypotheses. Then more special hypotheses can be eliminated from further search, if a more general hypothesis was already verified. But this approach alone has some drawbacks and even in moderately sized data does not prevent large sets of findings. Therefore, in a second evaluation phase, further aspects of interestingness are assessed. A refinement strategy selects the most interesting of the statistically significant statements. A second problem for discovery systems is efficiency. Each hypothesis evaluation requires many data accesses. We describe strategies that reduce data accesses and speed up computation.

Keywords

discovery in databases interestingness multidimensional search Explora 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chan, P., and Stolfo, S. (1993). “Towards Parallel and Distributed Learning by Meta-Learning.” In Piatetsky-Shapiro, G. (Ed.),Proc. AAAI-93 Workshop on Knowledge Discovery in Database, AAAI Press TR-20, pp. 227–240.Google Scholar
  2. Frawley, W.J., Piatetsky-Shapiro, G., and Matheus, C.J. (1991). “Knowledge Discovery in Databases: An Overview.” In Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.Google Scholar
  3. Gebhardt, F. (1991). “Choosing among Competing Generalizations.”Knowledge Acquisition 3, pp. 361–380.Google Scholar
  4. Gebhardt, F. (1994). “Discovering interesting statements from a database.”Applied Stochastic Models and Data Analysis 10 (1).Google Scholar
  5. Hoschka, P., and Klösgen, W. (1991). “A Support System for Interpreting Statistical Data.” In Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.Google Scholar
  6. Klösgen, W. (1992a). “Problems for Knowledge Discovery in Databases and their Treatment in the Statistics Interpreter EXPLORA.”International Journal for Intelligent Systems vol. 7(7), pp. 649–673.Google Scholar
  7. Klösgen, W. (1992b). “Patterns for Knowledge Discovery in Databases.” In Zytkow, J. (Ed.),Proc. ML-92 Workshop on Machine Discovery, pp. 1–10. National Institute for Aviation Research, Wichita, KS.Google Scholar
  8. Klösgen, W. (1993).Explora: A support system for Discovery in Databases, Version 1.1, User Manual. GMD, Sankt Augustin.Google Scholar
  9. Koopmans, L.H. (1981).An Introduction to Contemporary Statistics. Duxbury Press, Boston, MA.Google Scholar
  10. Major, J.A., and Mangano, J.J. (1994). this issue.Google Scholar
  11. Matheus, C.J., Chan, P.K., and Piatetsky-Shapiro, G. (1993). “Systems for Knowledge Discovery in Databases.” IEEE TKDE special issue onLearning and Discovery in Knowledge-Based Databases.Google Scholar
  12. Merzbacher, M., and Chu, W. (1993). “Pattern-Based Clustering for Database Attribute Values.” In Piatetsky-Shapiro, G. (Ed.),Proc. AAA1-93 Workshop on Knowledge Discovery in Database, AAAI Press TR-20, pp. 291–298.Google Scholar
  13. Morik, K., Wrobel, S., Kietz, J. U., and Emde, W. (1993).Knowledge Acquisition and Machine Learning: Theory, Methods and Applications. Academic Press, New York.Google Scholar
  14. Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.) (1991),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.Google Scholar
  15. Piatetsky-Shapiro, G. and Matheus, C. J. (1992). “Knowledge Discovery Workbench for Exploring Business Databases.”International Journal for Intelligent Systems vol. 7(7), pp. 675–686.Google Scholar
  16. Quinlan, J. R. (1990). “Learning Logical Definitions from Relations.”Machine Learning 5(3), pp. 239–266.Google Scholar
  17. Valdes-Perez, R., Simon, H., and Zytkow, J. (1993). “Scientific Model Building as Search in Matrix Spaces.” InProc. Eleventh National Conference on Artificial Intelligence, pp. 472–478.Google Scholar
  18. Zytkow, J. (Ed.) (1992).Proc. ML-92 Workshop on Machine Discovery. “National Institute for Aviation Research,” Wichita, KS.Google Scholar
  19. Zytkow, J., and Baker, J. (1991). “Interactive Mining of Regularities in Databases.” In Piatetsky-Shapiro, G., and Frawley, W. J. (Eds.),Knowledge Discovery in Databases. MIT Press, Cambridge, MA.Google Scholar
  20. Zytkow, J., and Zembowicz, R. (1993). “Database Exploration in Search of Regularities.”Journal of Intelligent Information Systems 2, pp. 39–81.Google Scholar

Copyright information

© Kluwer Academic Publishers 1995

Authors and Affiliations

  • Willi Klösgen
    • 1
  1. 1.German National Research Center for Computer Science (GMD)Sankt AugustinGermany

Personalised recommendations