Abstract
We present a methodology that enables the use of classification algorithms on regression tasks. We implement this method in system RECLA that transforms a regression problem into a classification one and then uses an existent classification system to solve this new problem. The transformation consists of mapping a continuous variable into an ordinal variable by grouping its values into an appropriate set of intervals. We use misclassification costs as a means to reflect the implicit ordering among the ordinal values of the new variable. We describe a set of alternative discretization methods and, based on our experimental results, justify the need for a search-based approach to choose the best method. Our experimental results confirm the validity of our search-based approach to class discretization, and reveal the accuracy benefits of adding misclassification costs.
Chapter PDF
Similar content being viewed by others
References
Breiman,L., Friedman,J.H., Olshen,R.A. & Stone,C.J. (1984): Classification and Regression Trees, Wadsworth Int. Group, Belmont, California, USA, 1984.
Bhattacharyya,G., Johnson,R. (1977): Statistical Concepts and Methods. John Wiley & Sons.
Clark, P. and Niblett, T. (1988): The CN2 induction algorithm. In Machine Learning, 3.
Dillon, W. and Goldstein, M. (1984): Multivariate Analysis. John Wiley & Sons, Inc.
Fayyad, U.M., and Irani, K.B. (1993): Multi-interval Discretization of Continuous-valued Attributes for Classification Learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI-93). Morgan Kaufmann Publishers.
Fisher, R.A. (1936): The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.
Fix, E., Hodges, J.L. (1951): Discriminatory analysis, nonparametric discrimination consistency properties. Technical Report 4, Randolph Field, TX: US Air Force, School of Aviation Medicine.
John,G.H., Kohavi,R. and Pfleger, K. (1994): Irrelevant features and the subset selection problem. In Proceedings of the 11th IML. Morgan Kaufmann.
Kohavi, R. (1995): Wrappers for performance enhancement and oblivious decision graphs. PhD Thesis.
Merz,C.J., Murphy,P.M. (1996): UCI repository of machine learning databases [http://www.ics.uci.edu/MLReposiroty.html]. Irvine, CA. University of California, Department of Information and Computer Science.
Quinlan, J. R. (1993): C4.5: programs for machine learning. Morgan Kaufmann Publishers.
Stone, M. (1974): Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, B 36, 111–147.
Weiss, S. and Indurkhya, N. (1993): Rule-base Regression. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1072–1078.
Weiss, S. and Indurkhya, N. (1995): Rule-based Machine Learning Methods for Functional Prediction. In Journal Of Artificial Intelligence Research (JAIR), volume 3, pp.383–403.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Torgo, L., Gama, J. (1997). Search-based class discretization. In: van Someren, M., Widmer, G. (eds) Machine Learning: ECML-97. ECML 1997. Lecture Notes in Computer Science, vol 1224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62858-4_91
Download citation
DOI: https://doi.org/10.1007/3-540-62858-4_91
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62858-3
Online ISBN: 978-3-540-68708-5
eBook Packages: Springer Book Archive