Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

KDD Pipeline

  • Hans-Peter KriegelEmail author
  • Matthias Schubert
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1134


Data mining pipeline; Data mining process; KDD process


The KDD pipeline describes the complete process of knowledge discovery in databases (KDD), i.e. the process of deriving useful, valid and non-trivial patterns from a large amount of data. The pipeline consists of five consecutive steps:


The selection step identifies the goal of the current application and selects a data set that is likely to contain relevant patterns.


The preprocessing step increases the quality of the data set by supplementing missing attributes, removing duplicate instances and resolving data inconsistencies.


The transformation step deletes correlated and irrelevant attributes and derives new more meaningful attributes from the current data description.

Data Mining

This step selects a data mining algorithm with respect to the goal which was identified in the selection step and derives patterns or learns functions that are valid for the current data set.


This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Brachman R, Anand T. The process of knowledge discovery in databases: a human centered approach. In: Proceedings of the 10th National Conference on Artificial Intelligence; 1996. p. 37–8.Google Scholar
  2. 2.
    Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery in databases. In: Proceedings of the 10th National Conference on Artificial Intelligence; 1996. p. 1–30.Google Scholar
  3. 3.
    Fayyad U, Piatetsky-Shapiro G, Smyth P. Knowledge discovery and data mining: towards a unifying framework. In: Proceedings of the 2nd Internatinal Conference on Knowledge Discovery and Data Mining; 1996. p. 82–8.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Ludwig-Maximilians-UniversityMunichGermany

Section editors and affiliations

  • Daniel A. Keim
    • 1
  1. 1.Computer Science DepartmentUniversity of KonstanzKonstanzGermany