Zusammenfassung
Dieser Artikel gibt einen überblick über das Gebiet der Wissensentdeckung in Datenbanken und Data Mining. Ferner gibt der Artikel eine übersicht zu existierenden Techniken, Werkzeugen und Anwendungen in wissenschaftlicher Forschung und industrieller Praxis. Die verschiedenen Phasen des Prozesses der Wissensentdeckung werden vorgestellt und analysiert. Es gibt eine Reihe von Data Mining Zielen, die sich durch Anwendung des extrahierten Wissens bearbeiteten lassen. Wir beschreiben diese Ziele und stellen die entsprechenden Verfahren vor, die zur Erreichung dieser Ziele geeignet sind. Solche Verfahren basieren auf statistischen Methoden, neuronalen Netzen, Case-Based Reasoning und symbolischen Lernverfahren. Einige wichtige Phasen des Prozesses, wie die Vorbereitung der Daten, die eigentliche Entdeckung neuen Wissens und Bewertung der Ergebnisse werden wir ausfuhrlicher diskutieren. Inzwischen hat die Wissensentdeckung in Datenbanken in verschiedenen Gebieten zahlreiche Anwendungen gefunden. Außerdem sind die Anzahl der existierenden Systeme für die Wissensentdeckung explosionsartig in die Höhe gestiegen. Aus diesem Grund ist eine Beschreibung diverser Anwendungen und die Vorstellung aller existierender Systeme nicht möglich. Wir stellen jedoch einige Anwendungen vor und beschreiben einige ausgewählte Systeme. Ein überblick über die aktuellen Forschungsthemen schließt den Artikel ab.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Literatur
Agrawal, R., Imielinski, T. & Swami, A. (1993). Mining Association Rules between Sets of Items in Large Databases, in: Buneman, P. & Jajodia, S. (Ed.) Proceedings of the ACMSIGMOD Conference on Management of Data. May, 26–28, Washington DC, USA, SIGMOD Record 22(2), pp. 207–216.
Agrawal, R. & Srikant, R. (1995). Mining Sequential Patterns, in: Yu, P.S. & Chen, A.L. (Ed.) Proceedings of the 11th International Conference on Data Engineering. March, 6–10, Taipei, Taiwan, IEEE Computer Society, ISBN 0-8186-6910-1, pp. 3–14.
Aha, D. W., Kibler, D. & Albert, M. K. (1991). Instance-based Learning Algorithms. Machine Learning, 6, pp. 37–66.
Anders, U. (1997). Statistische neuronale Netzwerke. Dissertation, Universität Karlsruhe.
Bentler, P. (1985). Theory and Implementation of EQS: A Structural Equations Program. BMDP Statistical Software Inc., Los Angeles.
Bock, H. H. (1974). Automatische Klassifikation. Göttingen: Vandenhoek & Ruprecht.
Bol, G., Nakhaeizadeh, G. & Vollmer K.-H. (Hrsg.) (1996). Finanzmarktanalyse und Prognose mit innovativen quantitativen Verfahren. Heidelberg: Physica Verlag.
Borgelt, C. & Kruse, R. (1997). Attributauswahlmaße für die Induktion von Entscheidungsbäumen: Ein überblick. in: Nakhaeizadeh, G. (Hrsg.) Data Mining: Theoretische Aspekte und Anwendungen. Heidelberg: Physica Verlag.
Bouckaert, R. R. (1994). Probabilistic Network Construction Using the Minimum Description Length Principle. Technical Report RUU-CS-94-27, Utrecht University, Dep. of Computer Science.
Brachman, R.J., Selfridge, L., Terven, L., Altman, B., Halper, F., Kirk, T., Lazar, A., McGuiness, D. & Resnick, L. (1993). Integrated Support for Data Archaeology, in: Piatetsky-Shapiro, G. (Ed.) Proceedings of 1993 AAAI Workshop on Knowledge Discovery in Databases. Washington, D.C. 11–12, 1991, Menlo Park, CA: AAAI Press, pp. 197–212.
Brachman, R.J. & Anand, T. (1994). The Process of Knowledge Discovery in Databases: A First Sketch. in: Fayyad, U.M. & Uthurasamy, R. (Ed.). Proceedings of 1994 AAAI Workshop on Knowledge Discovery in Databases. July, 31–August, 1., Seattle, Washington. Menlo Park, CA: AAAI Press, pp. 1–12.
Brachman, R.J., Khabaza, T., Klösgen, W., Piatetsky-Shapiro, G. & Simoudis, E. (1996). Mining Business Databases. Communications of the ACM, Vol. 39, No. 11, pp. 42–48.
Breiman, L., Friedman, J. H., Olshen, A. & Stone, C. J. (1984). Classification and Regression Trees. Wadsworth: Belmont.
Breitner, C., Freyberg, A. & Schmidt, A. (1995). Towards a Flexible and Integrated Environment for Knowledge Discovery. in: Ong, K., Conrad, S. & Ling, T.W. (Ed.) Knowledge Discovery and Temporal Reasoning in Deductive and Object-Oriented Databases: Proceedings of the DOOD’ 95 Post-Conference Workshops, December, 4–8, Department of Information Systems and Computer Science, National University of Singapore, pp. 28–35.
Breitner, C., Lockemann, P. & Schlösser, J. (1997a). Die Rolle der Informationsverwaltung im KDD Prozeß. in: Nakhaeizadeh, G. (Hrsg.) Data Mining: Theoretische Aspekte und Anwendungen. Heidelberg: Physica Verlag.
Breitner, C., Schlösser, J. & Wirth, R. (1997b). Process-Based Data Base Support for the Early Indicator Method. erscheint in: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. August, 14–17, Newport Beach, CA, Menlo Park, CA: AAAI Press.
Brodley, C. & Smyth, P. (1997). Applying Classification Algorithms in Practice. erscheint in: Statistics and Computing.
Castillo, E., Gutierrez, J. M. & Hadi, A. S. (1997). Expert Systems and Probabilistic Network Models. Monographs in Computer Science. Heidelberg: Springer Verlag.
Clark, P. & Niblett, T. (1988). The CN2 Induction Algorithms. Machine Learning, 3, pp. 261–285.
Dasarathy, B. V. (Ed.) (1991). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. Los Alamitos, CA: IEEE Computer Society Press.
Dasarathy, B. V. & Sheela, B. V. (1979). A Composite Classifier System Design: Concepts and Methodology. Pattern Recognition and Image Processing, Vol. 67, Nr. 5, pp. 708–713.
Datta, P. & Kibler, D. (1995). Learning Prototypical Concept Descriptions. in: Prieditis, A. & Russell, S. (Ed.) Proceedings of the 12th International Conference on Machine Learning. July, 9–12, Tahoe City, CA, Menlo Park, CA: Morgan Kaufmann, pp. 158–166.
Dempster, A.P., Laird, N.M. & Rubin, D.B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of Royal Statistical Society, Vol. 39, pp. 1–38.
Dougherty, J., Kohavi, R. & Sahami, M. (1995). Supervised and Unsupervised Discretization of Continuous Features. in: Prieditis, A. & Russell, S. (Ed.) Proceedings of the 12th International Conference on Machine Learning, July, 9–12, Tahoe City, CA, Menlo Park, CA: Morgan Kaufmann.
Edwards, D. (1995). Introduction to Graphical Modelling. Springer Texts in Statistics. Heidelberg: Springer.
Engels, R., Lindner, G. & Studer, R. (1997). Benutzerunterstützung für Wissensentdekkung in Datenbanken. in: Nakhaeizadeh, G. (Hrsg.) Data Mining: Theoretische Aspekte und Anwendungen. Heidelberg: Physica Verlag.
Ester, M., Kriegel, H.-P. & Xu, X. (1995). Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification. in: Egenhofer, M.J. & Herring, J.R. (Ed.) Advances in Spatial Databases, 4th International Symposium, August, 6–9, Portland, Maine. LNCS, Vol. 951, Berlin: Springer, pp. 67–82.
Fahrmeir, L. & Hamerle, A. (Hrsg.) (1984). Multivariate statistische Verfahren. Berlin: Verlag de Gruyter.
Famili, A., Shen, W.-M., Weber, R. & Simoudis, E. (1997). Data Preprocessing and Intelligent Data Analysis. Intelligent Data Analysis, Vol. 1, No. 1, http://www.elsevier.com/locate/ida.
Fayyad, U.M. (1997). Editorial zu Data Mining and Knowledge Discovery, Vol. 1, 1.
Fayyad, U.M., Piatetsky-Shapiro, G. & Smyth, P. (1996a). Knowledge Discovery and Data Mining: Towards a Unifying Framework. in: Simoudis, E., Han, J. & Fayyad, U. (Ed.). Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. August, 2–4, Portland, Oregon. Menlo Park, CA: AAAI Press, pp. 82–88.
Fayyad, U.M., Djorgovski, S.G. & Weir, N. (1996b). From Digitized Images to Online Catalogs. AI Magazine, Summer, pp. 51–66.
Fayyad, U.M. & Irani, K. (1993). Multi-Interval Discretization of continuous-valued Attributes for Classification Learning. in: Bajcsy, R. (Ed.) Proceedings of the 13th International Conference on Artificial Intelligence. August, 28–September, 3, Chamberry, France. San Mateo, CA: Morgan Kaufmann, pp. 1022–1027.
Fayyad, U.M., Piatetsky-Shapiro, G. & Smyth, P. (1995). Proceedings of the 1st International Conference on Knowledge Discovery in Databases. August, 20–21, Montreal, Canada, Menlo Park, CA: AAAI Press.
Fayyad, U.M. & Uthurasamy, R. (1994). Proceedings of 1994 AAAI Workshop on Knowledge Discovery in Databases. July, 31–August, 1, Seattle, Washington. Menlo Park, CA: AAAI Press.
Fisher, D.H. (1987a). Knowledge Acqusition Via Incremental Conceptual Clustering. Machine Learning, 2, pp. 139–172.
Fisher, D.H. (1987b). Knowledge Acqusition Via Incremental Conceptual Clustering. Doctoral Dissertation, University of California, Irvine.
Gennari, J. H., Langley, P. & Fisher, D. (1989). Models of Incremental Concept Formation. Artificial Intelligence, 40, pp. 11–61.
Graf, J. & Nakhaeizadeh, G. (1994). Application of Learning Algorithms to Predicting Stock Prices. in: Plantamura, V. et al. (Ed.) Logistic and Learning for Quality Software Management and Manufacturing, New York: Wiley & Sons, pp. 241–257.
Grimmer, U. & Mucha, A. (1997). Datensegmentierung mittels Clusteranalyse. in: Nakhaeizadeh, G. (Hrsg.) Data Mining: Theoretische Aspekte und Anwendungen. Heidelberg: Physica Verlag.
Hanson, R., Stutz, J. & Cheesman, P. (1991). Bayesian Classification Theory. Technical Report, FIA-90-12-7-10.
Hartung, J., Elpelt, B. & Klösener, K.-H. (1987). Statistik. 6. Auflage. München: Oldenbourg Verlag.
Heckerman, D. (1995). A Tutorial on Learning Bayesian Networks. Technical Report MSR-TR-95-06, Microsoft Research.
Heckerman, D. (1997). Bayesian Networks for Data Mining. Data Mining and Knowledge Discovery, Vol. 1, 1, pp. 79–119.
Heckerman, D., Geiger, D. & Chickering, D. M. (1995). Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Machine Learning, 20, pp. 197–243.
Herskovits, E.H. & Cooper, G. F. (1990). Kutat’o: An Entropy-driven System for the Construction of Probabilistic Expert Systems from Databases. in: Proceedings Uncertainty in Artificial Intelligence, Volume 6, North-Holland, pp. 54–62.
Jafär-Shaghaghi, F. (1996). Maschinelles Lernen, Neuronale Netze und Statistische Lernverfahren zur Klassifikation und Prognose. Aachen: Shaker Verlag.
Jobson, J. D. (1992). Applied Multivariate Data Analysis. Volume II: Categorical and Multivariate Methods. Heidelberg: Springer-Verlag.
John, G.H., Kohavi, R. & Pfleger, K. (1994). Irrelevant Features and the Subset Selection Problem. in: Cohen, W.W. & Hirsh, H. (Ed.) Proceedings of the 11th International Conference on Machine Learning, July, 10–13, Rutgers University, New Brunswick, N.J. Menlo Park, CA: Morgan Kaufmann, pp. 121–129.
John, G.H. & Langley, P. (1996). Static versus Dynamic Sampling for Data Mining. in: Simoudis, E., Han, J. & Fayyad, U. (Ed.). Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. August, 2–4, Portland, Oregon. Menlo Park, CA: AAAI Press, pp. 367–370.
Joreskog, K. & Sorbom, D. (1984). LISREL VI User’s Guide. Scientific Software, Inc., Mooresville, IN.
Imielinski, T. & Mannila, H. (1996). A Database Perspective on Knowledge Discovery. Communications of the ACM (11), pp. 58–64.
Kauderer, H. & Nakhaeizadeh, G. (1997). The Effect of Alternative Scaling Approaches on the Performance of Different Supervised Learning Algorithms: An Empirical Study in the Case of Credit Scoring, erscheint in: Fawcett, T., Haimowitz, I., Provost, F. & Stolfo, S. (Hrsg.) Proceedings of the AAAI Workshop on AI Approaches to Fraud Detection and Risk Management.
Klinkenberg, R.H. & Clair, D.C. (1996). Rule Set Quality Measures for Inductive Learning Algorithms. in: Dagli, C.H., Akay, M., Chen, C.L.P., Fernandez, B.R. & Ghosh, J. (Ed.) Intelligent Engineering Systems through Artificial Neural Networks, Vol. 6. New York, ASME Press, pp. 161–168.
Kohavi, R. (1995). A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. in: Proceedings of the 14th International Joint Conference on Artificial Intelligence. August, 20–25, Montreal, Canada. San Mateo: CA: Morgan Kaufmann, pp. 1137–1143.
Kohonen, T. (1988). Self-Organization and Associative Memory. Berlin: Springer.
Kolodner, J. (1993). Case-Based Reasoning. San Mateo, CA: Morgan-Kaufmann.
Lakshminarayan, K., Harp, S.A., Goldman, R. & Samad, T. (1996). Imputation of Missing Data Using Machine Learning Techniques. in: Simoudis, E., Han, J. & Fayyad, U. (Ed.). Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. August, 2–4, Portland, Oregon. Menlo Park, CA: AAAI Press, pp. 140–145.
Langley, P. (1994). Selection of Relevant Features in Machine Learning. in: Proceedings of the AAAI Fall Symposium on Relevance. New Orleans, LA: AAAI Press.
Langley, P. & Simon, H. (1995). Applications of Machine Learning and Rule Induction. Communications of the ACM, Vol. 38, No. 11, pp. 55–63.
Lavrac, N. & Dzeroski, S. (1994). Inductive Logic Programming: Techniques and applications. Ellis Horwood.
Lavrac, N. & Wrobel, S. (1996). Induktive Logikprogrammierung — Grundlagen und Techniken, Künstliche Intelligenz, 3, pp. 46–54.
Leamer, E.E. (1978). Specification Searches: Ad Hoc Inference with Nonexperimental Data. New York: John Wiley & Sons.
Little, R.J. & Rubin, D.B. (1987). Statistical Analysis with Missing Data. New York: John Wiley & Sons.
Lockemann, P.C. & Schmidt, J.W. (1987). Datenbankhandbuch. Berlin: Springer.
Madigan, D., Raftery, A., Volinsky, C. & Hoeting, J. (1996). Bayesian Model Averaging. in: Proceedings of the AAAI Workshop on Integrating Multiple Learned Models. Portland, OR.
Mantaras de, L.R. & Plaza, E. (1995). Case-Based Reasoning, in: State of the Art in Machine Learning, produced by Research Committee of Mlnet.
Michalski, R., Mozetic, I., Hong, J. & Lavrac, N. (1986). The AQ15 Inductive Learning System: An Overview and Experiments. in: Proceedings of IMAL 1986, Universite de Paris-Sud, Orsay.
Michie, D. Spiegelhalter, D. & Taylor, C. (1994). Machine Learning, Neural and Statistical Classification. Ellis-Horwood-Series in Artificial Intelligence, England.
Müller, M., Hausdorf, C. & Schneeberger, J. (1997). Eine Theorie der Interessantheit für die Entdeckung von Wissen in Datenbanken. in: Nakhaeizadeh, G. (Hrsg.) Data Mining: Theoretische Aspekte und Anwendungen. Heidelberg: Physica Verlag.
Nakhaeizadeh, G. (1996). CBR gleich KNN! Künstliche Intelligenz, 1, pp. 36–37.
Nakhaeizadeh, G. & Schnabl, A. (1997). Development of Multi-Criteria Metrics for Evaluation of Data Mining Algorithms. erscheint in: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, August, 14–17, Newport Beach, CA, Menlo Park, CA: AAAI Press.
Nakhaeizadeh, G., Taylor, C. C. & Kunisch, G. (1996). Dynamic Aspects of Statistical Classification. in: Imam, I. (Ed.) Intelligent Adaptive Agents, AAAI Technical Report No. WS-96-04. Menlo Park, CA: AAAI Press, pp. 55–64.
Nieschlag, R., Dichtl, E. & Hörschgen, H. (1988). Marketing. Berlin: Duncker & Humbolt.
Piatetsky-Shapiro, G. (1991). Proceedings of 1991 AAAI Workshop on Knowledge Discovery in Databases. Anaheim, CA, July 14–15, 1991, Menlo Park, CA: AAAI Press.
Piatetsky-Shapiro, G. (1993). Proceedings of 1993 AAAI Workshop on Knowledge Discovery in Databases. July, 11–12, Washington, D.C. Menlo Park, CA: AAAI Press.
Piatetsky-Shapiro, G., Brachman, R., Khabaza, T., Klösgen & Simoudis, E. (1996). An Overview of Issues in Developing Industrial Data Mining and Knowledge Discovery Applications. in: Simoudis, E., Han, J. & Fayyad, U. (Ed.). Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. August, 2–4, Portland, Oregon. Menlo Park, CA: AAAI Press, pp. 89–95.
Pindyck, R. S. & Rubinfeld, D. L. (1991). Econometric Models and Economic Forecasts. New York: McGraw-Hill.
Quinlan, J. R. (1979). Discovery Rules from large Collections of Examples: A Case Study. in: Michie, D. (Hrsg.) Expert Systems in the Micro Electronic Age. Edinburgh: University Press.
Quinlan, J. R. (1986). Induction of Decision Trees. Machine Learning, 4, pp. 81–106.
Quinlan, R. (1993a). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Quinlan, J. R. (1993b). Combining Instance-based and Model-based Learning. in: Proceedings of the Tenth International Conference on Machine Learning, June, 27–29, University of Massachusetts, Amherst. San Mateo, CA: Morgan Kaufmann Publishers, pp. 236–243.
Reich, Y. (1994). Macro and Micro Perspectives of Multistrategy Learning. in: Michalski, R. & Tecuci, G. (Hrsg.) Machine Learning: A Multistrategy Approach, Vol. IV, San Francisco, CA: Morgan Kaufmann, pp. 379–401.
Reinartz, T. (1997). More Intelligent Sampling for Data Mining, zur Veröffentlichung eingereicht.
Reinartz, T. & Wirth, R. (1995). The Need for a Task Model for Knowledge Discovery in Databases, in: Kodratoff, Y., Nakhaeizadeh, G. & Taylor, C. (Ed.) Workshop Notes Statistics, Machine Learning, and Knowledge Discovery in Databases. MLNet Familiarization Workshop, Heraklion, Crete, pp. 19–24.
Richeldi, M. & Rossotto, M. (1997). Combining Statistical Techniques and Search Heuristics to Perform Effective Feature Selection. in: Nakhaeizadeh, G. & Taylor, C. (Ed.) Machine Learning and Statistics: The Interface. New York: John Wiley & Sons, pp. 269–291.
Silberschatz, A. & Tuzhilin, A. (1996). What Makes Patterns Interesting in Knowledge Discovery Systems. IEEE Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, pp. 970–974.
Simoudis, E., Han, J. & Fayyad, U. (1996). Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland, Oregon, August, 2–4. Menlo Park, CA: AAAI Press.
Spirtes, P., Glymour, C. & Scheines, R. (1993). Causation, Prediction, and Search. Lecture Notes in Statistics, Vol. 81, Heidelberg: Springer-Verlag.
Steurer, E. (1997). Ökonometrische Methoden und maschinelle Lernverfahren zur Wechselkur sprognose: Theoretische Analyse und empirischer Vergleich. Heidelberg: Physica-Verlag.
Taylor, C.C. & Nakhaeizadeh, G. (1997). Learning in Dynamically Changing Domains: Theory Revision and Context Dependence Issues. in: van Someren, M. & Widmer, G. (Hrsg.) Proceedings of 9th European Conference on Machine Learning, April, 23–25, Prag, Heidelberg: Springer-Verlag, pp. 353–360.
Toivonen, H. (1996). Sampling Large Databases for Finding Association Rules, in: Vijayaraman, T.M., Buchman, A.P., Mohan, C. & Sarda, N.L. (Ed.) Proceedings of the 22nd International Conference on Very Large Databases, September, 3–6, Mumbai, India, pp. 134–145.
Von Hasseln, H. & Nakhaeizadeh, G (1997). Dependency Analysis and Learning Structures for Data Mining: A Survey, in Vorbereitung.
Weiss, S.M. & Kulikowski, C.A. (1991). Computer Systems that Learn. San Francisco, CA: Morgan Kaufmann.
Westphal, M. & Nakhaeizadeh, G. (1996). Application of Multistrategy Learning in Finance. in: Michalski, R. S. & Wnek, J. (Ed.). Proceedings of the Third International Workshop on Multistrategy Learning. Palo Alto, CA: AAAI Press, pp. 333–337.
Wirth, R. & Reinartz, T. (1996). Detecting Early Indicator Cars in an Automative Database: A Multi-Strategy Approach. in: Simoudis, E., Han, J. & Fayyad, U. (Ed.). Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. August, 2–4, Portland, Oregon. Menlo Park, CA: AAAI Press, pp. 76–81.
Wirth, R. Shearer, C., Grimmer, U., Reinartz, T.P., Schloesser, J., Breitner, C., Engels, R. & Lindner, G. (1997). Towards Process-Oriented Tool Support for KDD. erscheint in: Proceedings of International Conference on Principles of Knowledge Discovery in Databases. June, 25–27, Trondheim, Norway, Heidelberg: Springer.
Zhang, J. (1992). Selecting Typical Instances in Instance-Based Learning. in: Sleeman, D. & Edwards, P. (Ed.) Proceedings of the 9th International Workshop on Machine Learning, Aberdeen. San Mateo, CA: Morgan Kaufmann, pp. 470–479.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Physica-Verlag Heidelberg
About this chapter
Cite this chapter
Nakhaeizadeh, G., Reinartz, T., Wirth, R. (1998). Wissensentdeckung in Datenbanken und Data Mining: Ein überblick. In: Nakhaeizadeh, G. (eds) Data Mining. Beiträge zur Wirtschaftsinformatik, vol 27. Physica-Verlag HD. https://doi.org/10.1007/978-3-642-86094-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-86094-2_1
Publisher Name: Physica-Verlag HD
Print ISBN: 978-3-7908-1053-0
Online ISBN: 978-3-642-86094-2
eBook Packages: Springer Book Archive