Abstract

No matter how “intelligent” a data mining algorithm is, it will fail to discover high-quality knowledge if it is applied to low-quality data. In this chapter we focus on data preparation methods for data mining. The general goal is to improve the quality of the data being mined, to facilitate the application of a data mining algorithm. Hence, the methods discussed in this chapter can be regarded as a form of preprocessing for a data mining algorithm.

Keywords

Entropy Income 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Apte and Hong 1996]
    C. Apte and S.J. Hong. Predicting equity returns from securities data. In: U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (Eds.) Advances in Knowledge Discovery and Data Mining, 541–560. AAAI/MIT Press, 1996.Google Scholar
  2. [Bloedorn and Michalski 1998]
    E. Bloedorn and R.S. Michalski. Data driven constructive induction: methodology and applications. In: H. Liu and H. Motoda (Eds.) Feature Extraction, Construction and Selection: a data mining perspective, 51–68. Kluwer, 1998.Google Scholar
  3. [Catlett 1991]
    J. Catlett. On changing continuous attributes into ordered discrete attributes. Proceedings of the European Working Session on Learning (EWSL91). LNAI 482, 164–178. Springer, 1991.Google Scholar
  4. [Dougherty et al. 1995]
    J. Dougherty, R. Kohavi and M. Sahami. Supervised and unsupervised discretization of continuous features. Proceedings of the 12th International Conference Machine Learning, 194–202. Morgan Kaufmann, 1995.Google Scholar
  5. [Fayyad and Irani 1993]
    U.M. Fayyad and K.B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the 13th International Joint Conference Artificial Intelligence (IJCAI ‘83), 1022–1027. Chamberry, France, Aug./ Sep. 1993.Google Scholar
  6. [Freitas 1997]
    A.A. Freitas. The principle of transformation between efficiency and effectiveness: towards a fair evaluation of the cost-effectiveness of KDD techniques. Proceedings of the 1st European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD ‘87). Lecture Notes in Artificial Intelligence 1263, 299–306. Springer, 1997.Google Scholar
  7. [Freitas and Lavington 1996]
    A.A. Freitas and S.H. Lavington. Speeding up knowledge discovery in large relational databases by means of a new discretization algorithm. Advances in Databases: Proceedings of the 14th British National Conference on Databases (BNCOD ‘14). Lecture Notes in Computer Science 1094, 124–133. Springer, 1996.Google Scholar
  8. [Freitas and Lavington 1998]
    A.A. Freitas and S.H. Lavington. Mining Very Large Databases with Parallel Processing. Kluwer, 1998.Google Scholar
  9. [Ho and Scott 1997]
    K.M. Ho and P.D. Scott. Zeta: a global method for discretization of continuous variables. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD ‘87), 191–194. AAAI Press, 1997.Google Scholar
  10. [Hu 1998]
    Y.-J. Hu. Constructive induction: covering attribute spectrum. In: H. Liu and H. Motoda (Eds.) Feature Extraction, Construction and Selection: a data mining perspective, 257–272. Kluwer, 1998.Google Scholar
  11. [Hussain et al. 1999]
    F. Hussain, H. Liu, C.L. Tan and M. Dash. Discretization: an enabling technique. Technical Report TRC6/99. The National University of Singapore. June 1999.Google Scholar
  12. [John et al. 1994]
    G.H. John, R. Kohavi and K. Pfleger. Irrelevant features and the subset selection problem. Proceedings of the 11th International Conference Machine Learning, 121–129. Morgan Kaufmann, 1994.Google Scholar
  13. Kerber 1992] R. Kerber. ChiMerge: Discretization of numeric attributes. Proceedings of the 1992 National Conference American Assoc. for Artificial Intelligence (AAAI ‘82),123–128.Google Scholar
  14. [Kohavi and John 1998]
    R. Kohavi and G.H. John. The wrapper approach. In: H. Liu and H. Motoda (Eds.) Feature Extraction, Construction and Selection: a data mining perspective, 33–50. Kluwer, 1998.Google Scholar
  15. [Kohavi and Sahami 1996]
    R. Kohavi and M. Sahami. Error-based and entropy-based discretization of continuous features. Proceedings of the 2nd International Conference Knowledge Discovery and Data Mining (KDD ‘86), 114–119. AAAI, 1996.Google Scholar
  16. [Koller and Sahami 1996]
    D. Koller and M. Sahami. Toward optimal feature selection. Proceedings of the 13th International Conference Machine Learning. Morgan Kaufmann, 1996.Google Scholar
  17. [Liu and Motoda 1998]
    H. Liu and H. Motoda. Feature Selection for Knowledge Discovery and Data Mining. Kluwer, 1998.Google Scholar
  18. [Mannino et al. 1988]
    M.V. Mannino, P. Chu and T. Sager. Statistical profile estimation in database systems. ACM Computing Surveys, 20 (3), 191–221, Sep. 1988.CrossRefMATHGoogle Scholar
  19. [Paliouras and Bree 1995]
    G. Paliouras and D.S. Bree. The effect of numeric features on the scalability of inductive learning programs. Proceedings of the 8th European Conference Machine Learning (ECML ‘85). LNAI 912, 218–231. Springer, 1995.Google Scholar
  20. [Parthasarathy et al. 1998]
    S. Parthasarathy, R. Subramonian and R. Venkata. Generalized discretization for summarization and classification. Proceedings of the 2nd International Conference on the Practical Applications of Knowledge Discovery and Data Mining (PADD ‘88), 219–239. The Practical Application Company, UK, 1998.Google Scholar
  21. [Pfahringer 1995]
    B. Pfahringer. Supervised and unsupervised discretization of continuous features. Proceedings of the 12th International Conference Machine Learning, 456–463. Morgan Kaufmann, 1995.Google Scholar
  22. [Rendell and Seshu 1990]
    L. Rendell and R. Seshu. Learning hard concepts through constructive induction: framework and rationale. Computational Intelligence 6, 247–270, 1990.CrossRefGoogle Scholar
  23. [Richeldi and Rossotto 1995]
    M. Richeldi and M. Rossotto. Class-driven statistical discretization of continuous attributes. Proceedings of the 8th European Conference Machine Learning (ECML ‘85). LNAI 912, 335–338. Springer, 1995.Google Scholar
  24. [Smyth and Goodman 1991]
    P. Smyth and R.M. Goodman. Rule induction using information theory. In G. Piatetsky-Shapiro and W.J. Frawley (Eds.) Knowledge Discovery in Databases, 159–176. Menlo Park, CA: AAAI Press, 1991.Google Scholar
  25. [Wang and Sundaresh 1998]
    Wang and Sundaresh. Selecting features by vertical compactness of data. In: H. Liu and H. Motoda (Eds.) Feature Extraction, Construction and Selection, 71–84. Kluwer, 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Alex A. Freitas
    • 1
  1. 1.Computing LaboratoryUniversity of KentCanterburyUK

Personalised recommendations