Summary
We present here an abstract model in which data preprocessing and data mining proper stages of the Data Mining process are are described as two different types of generalization. In the model the data mining and data preprocessing algorithms are defined as certain generalization operators. We use our framework to show that only three Data Mining operators: classification, clustering, and association operator are needed to express all Data Mining algorithms for classification, clustering, and association, respectively. We also are able to show formally that the generalization that occurs in the preprocessing stage is different from the generalization inherent to the data mining proper stage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Hadjimichael, A. Wasilewska. A Hierarchical Model for Information Generalization. Proceedings of the Fourth Joint Conference on Information Sciences, Rough Sets, Data Mining and Granual Computing (RSDMGrC’98), NC, USA, vol. II, pp. 306–309
J. Han, M. Kamber. Data Mining: Concepts and Techniques. Morgan Kauffman, Los Altos, CA, 2000
M. Inuiguchi, T. Tanino. Classification Versus Approximation Oriented Generalization of Rough Sets. Bulletin of International Rough Set Society, 7:1/2, 2003
J. Komorowski. Modelling Biological Phenomena with Rough Sets. Proceedings of Third International Conference RSCTC’02, Malvern, PA, October 2002, p. 13. Springer Lecture Notes in Artificial Intelligence
T.Y. Lin. Database Mining on Derived Attributes. Proceedings of Third International Conference RSCTC’02, Malvern, PA, October 2002, pp. 14–32. Springer Lecture Notes in Artificial Intelligence
J.F. Martinez, E. Menasalvas, A. Wasilewska, C. Fernández, M. Hadjimichael. Extension of Relational Management System with Data Mining Capabilities. Proceedings of Third International Conference RSCTC’02, Malvern, PA, October 2002, pp. 421–428. Springer Lecture Notes in Artificial Intelligence
E. Menasalvas, A. Wasilewska, C. Fernández. The Lattice Structure of the KDD Process: Mathematical Expression of the Model and its Operators. International Journal of Information Systems and Fundamenta Informaticae, 48–62, special issues, 2001
E. Menasalvas, A. Wasilewska, C. Fernández, J.F. Martinez. Data Mining – A Semantical Model. Proceedings of 2002 World Congress on Computational Intelligence, Honolulu, Hawai, May 11–17, 2002, pp. 435–441
Z. Pawlak, Information Systems – Theoretical Foundations. Information Systems, 6:205–218, 1981
Z. Pawlak, Rough Sets – Theoretical Aspects Reasoning About Data. Kluwer, Dordecht, 1991
A. Skowron, Data Filtration: A Rough Set Approach. Proceedings de Rough Sets, Fuzzy Sets and Knowledge Discovery. 1993, pp. 108–118
A. Wasilewska, E.M. Ruiz, M.C. Fernández-Baizan. Modelization of Rough Set Functions in the KDD Frame. First International Conference on Rough Sets and Current Trends in Computing (RSCTC’98), Warsaw, Poland, June 22–26 1998
A. Wasilewska, E. Menasalvas. Data Preprocessing and Data Mining as Generalization Process. Proceedings of ICDM’04, the Fourth IEEE International Conference on Data Mining, Brighton, UK, November 1–4, 2004, pp. 25–29
A. Wasilewska, E. Menasalvas. Data Mining Operators. Proceedings of ICDM’04, the Fourth IEEE International Conference on Data Mining, Brighton, UK, November 1–4, 2004, pp. 43–52
A. Wasilewska, E. Menasalvas, C. Scharff. Uniform Model for Data Mining. Proceedings of FDM05 (Foundations of Data Mining), in ICDM2005, Fifth IEEE International Conference on Data Mining, Austin, Texas, November 27–29, 2005, pp. 19–27
A. Wasilewska, E.M. Ruiz. Data Mining as Generalization: A Formal Model. Foundation and Advances in Data Mining, T.Y. Lin, W. Chu, editors. Springer Lecture Notes in Artificial Intelligence, 2005
W. Ziarko, X. Fei. VPRSM Approach to WEB Searching. Proceedings of Third International RSCTC’02 Conference, Malvern, PA, October 2002, pp. 514–522. Springer Lecture Notes in Artificial Intelligence
W. Ziarko. Variable Precision Rough Set Model. Journal of Computer and Systen Sciences, 46(1):39–59, 1993
J.T. Yao, Y.Y. Yao. Induction of Classification Rules by Granular Computing. Proceedings of Third International RSCTC’02 Conference, Malvern, PA, October 2002, pp. 331–338. Springer Lecture Notes in Artificial Intelligence
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Wasilewska, A., Menasalvas, E. (2008). Data Preprocessing and Data Mining as Generalization. In: Lin, T.Y., Xie, Y., Wasilewska, A., Liau, CJ. (eds) Data Mining: Foundations and Practice. Studies in Computational Intelligence, vol 118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78488-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-78488-3_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78487-6
Online ISBN: 978-3-540-78488-3
eBook Packages: EngineeringEngineering (R0)