Data Preprocessing in Industrial Manufacturing

  • Przemyslaw GrzegorzewskiEmail author
  • Andrzej Kochanski
Part of the Studies in Systems, Decision and Control book series (SSDC, volume 183)


Each scientific modeling starts from data. However, even most sophisticated mathematical methods cannot produce a satisfying model if the data is of low quality. Before concluding about the quality of available data it is worth realizing the difference between datum quality or database quality. Moreover, most of data mining algorithms deal with the data in the form of an appropriately prepared single matrix. Unfortunately, the raw data is rarely stored in such form but is scattered over several databases, may contain observations which differ in formats or units, may abound with “garbage”, etc. Thus an adequate data preparation is an inevitable stage that should precede any modeling and further analysis. Both problems of data quality and data preparation are discussed in this chapter.


Data Database Data cleaning Data integration Data mining Data preparation Data reduction Data quality Datum quality Data transformation Empty value Knowledge discovery from data (KDD) Missing data Missing value 


  1. 1.
    Assunção, M.D., Calheiros, R.N., Bianchi, S., Netto, M.A.S., Buyya, R.: Big data computing and clouds: trends and future directions. J. Parallel Distrib. Comput. 79–80, 3–15 (2015)CrossRefGoogle Scholar
  2. 2.
    Azevedo, A., Santos, M.F.: KDD, SEMMA and CRISP-DM: a parallel overview. In: Proceedings of the IADIS European Conference on Data Mining, pp. 182–185 (2008)Google Scholar
  3. 3.
    Ballou, D.P., Pazer, H.L.: Modeling data and process quality in multi-input, multi-output information systems. Manag. Sci. 31 (1985)Google Scholar
  4. 4.
    Bandemer, H.: Mathematics of Uncertainty. Springer (2006)Google Scholar
  5. 5.
    Begoli, E., Horey, J.L.: Design principles for effective knowledge discovery from big data. In: Proceedings of the Joint Working IEEE/IFIP Conference on Software Architecture (WICSA) and European Conference on Software Architecture (ECSA), pp. 215–218 (2012)Google Scholar
  6. 6.
    Caroll, L.: Alice’s Adventures in WonderlandGoogle Scholar
  7. 7.
    Data Management Association: The six primary dimensions for data quality assessment. Defining Data Quality Dimensions, Report (2016)Google Scholar
  8. 8.
    García, S., Luengo, J., Herrera, F.: Discretization. In: Data Preprocessing in Data Mining. Intelligent Systems Reference Library, Springer (2015)Google Scholar
  9. 9.
    Grzymala-Busse, J.W., Hu, M.: A Comparison of several approaches to missing attribute values in data mining. In: Ziarko, W., Yao, Y. (eds.) Rough Sets and Current Trends in Computing, RSCTC 2000. Lecture Notes in Computer Science, pp. 378–385 (2005)Google Scholar
  10. 10.
    Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Techniques. Morgan Kaufmann Publisher (2012)Google Scholar
  11. 11.
    Hand, D., Manilla, H. Smith, P.: Principles of Data Mining. MIT Press (2001)Google Scholar
  12. 12.
    Kochanski, A.: Prediction of ductile cast iron properties by artificial neural networks. Ph.D. Thesis, Warsaw University of Technology (1999) (in Polish)Google Scholar
  13. 13.
    Kochanski, A.: Aiding the detection of cast defects causes. In: Świa̧tkowski (ed.) Polish Metallurgy 2002. Komitet Metalurgii Polskiej Akademii Nauk (2006) (in Polish)Google Scholar
  14. 14.
    Kochanski, A.: Data preparation. Comput. Method Mater. Sci. 10, 25–29 (2010)Google Scholar
  15. 15.
    Laudon, K.C.: Data quality and due process in large interorganizational record systems. Commun. ACM 29, 4–11 (1986)CrossRefGoogle Scholar
  16. 16.
    McCue, C.: Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis. Butterworth-Heinemann (2007)Google Scholar
  17. 17.
    Pipino, L.L., Lee, Y.W., Yang, R.Y.: Data quality assessment. Commun. ACM 45 (2002)Google Scholar
  18. 18.
    Press, G.: Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. Forbes, 23 Mar 2016Google Scholar
  19. 19.
    Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann Publisher (1999)Google Scholar
  20. 20.
    Pyle, D.: Data collection, preparation, quality and visualization. In: Nong, Y. (ed.) The Handbook of Data Mining. LEA Inc. (2003)Google Scholar
  21. 21.
    Redman, T.C.: Data Driven: Profiting from Your Most Important Business Asset. Harvard Business Press (2008)Google Scholar
  22. 22.
    Refaat, M.: Data Preparation for Data Mining Using SAS. Morgan Kaufmann Publisher (2007)Google Scholar
  23. 23.
    SAS Enterprise Miner—SEMMA. SAS InstituteGoogle Scholar
  24. 24.
    Shearer, C.: The CRISP-DM model: the new blueprint for data mining. J. Data Warehous. 5, 13–22 (2000)Google Scholar
  25. 25.
    Wand, Y., Wang, R.Y.: Anchoring data quality dimensions in ontological foundations. Commun. ACM CACM 39, 86–95 (1996)Google Scholar
  26. 26.
    Wang, Y.W., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12, 5–33 (1996)CrossRefGoogle Scholar
  27. 27.
    Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publisher (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Systems Research Institute, Polish Academy of SciencesWarsawPoland
  2. 2.Faculty of Mathematics and Information ScienceWarsaw University of TechnologyWarsawPoland
  3. 3.Faculty of Production EngineeringWarsaw University of TechnologyWarsawPoland

Personalised recommendations