Abstract
Only a subset of the boundary points—the segment borders—have to be taken into account in searching for the optimal multisplit of a numerical value range with respect to the most commonly used attribute evaluation functions of classification learning algorithms. Segments and their borders can be found efficiently in a linear-time preprocessing step.
In this paper we expand the applicability of segment borders by showing that inspecting them alone suffices in optimizing any convex evaluation function. For strictly convex evaluation functions inspecting all segment borders is also necessary. These results are derived directly from Jensen's inequality.
We also study the evaluation function Training Set Error which is not strictly convex. With that function the data can be preprocessed into an even smaller number of cut point candidates, called alternations, when striving for optimal partition. Examining all alternations also seems necessary, since—analogously to strictly convex functions—the placement of neighboring cut points affects the optimality of an alternation. We test empirically the reduction of the number of cut point candidates that can be obtained for Training Set Error on real-world data.
Similar content being viewed by others
References
Auer, P. (1997). Optimal Splits of Single Attributes. Unpublished manuscript, Institute for Theoretical Computer Science, Graz University of Technology.
Birkendorf, A. (1997). On Fast and Simple Algorithms for Finding Maximal Subarrays and Applications in Learning Theory. In Proceedings of the Third European Conference on Computational Learning Theory (pp. 98–209). Lecture Notes in Artificial Intelligence, vol. 1208. Jerusalem, Israel: Springer-Verlag.
Blake, C.L. and Merz, C.J. (1998). UCI Repository of Machine Learning Databases. Irvine, CA: Department of Information and Computer Science, University of California. Available at (http://www.ics.uci.edu/çmlearn/ MLRepository.html)
Breiman, L. (1996). Some Properties of Splitting Criteria. Machine Learning, 24, 41–47.
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and Regression Trees. Pacific Grove, CA: Wadsworth.
Codrington, C.W. and Brodley, C.E. (2002). On the Qualitative Behavior of Impurity-Based Splitting Rules I: The Minima-Free Property. Machine Learning, to appear.
Coppersmith, D., Hong, S.J., and Hosking, J.R.M. (1999). Partitioning Nominal Attributes in Decision Trees. Data Mining and Knowledge Discovery, 3, 197–217.
Cover, T.M. and Thomas, J.A. (1991). Elements of Information Theory. New York, NY: John Wiley and Sons.
Elomaa, T. and Rousu, J. (1999). General and Efficient Multisplitting of Numerical Attributes. Machine Learning, 36, 201–244.
Elomaa, T. and Rousu, J. (2000). Generalizing Boundary Points. In Proceedings of the Seventeenth National Conference on Artificial Intelligence (pp. 570–576). Austin, TX: AAAI Press.
Elomaa, T. and Rousu, J. (2001). On the Computational Complexity of Optimal Multisplitting. Fundamenta Informaticae, 47, 35–52.
Fayyad, U.M. and Irani, K.B. (1992). On the Handling of Continuous-Valued Attributes in Decision Tree Generation. Machine Learning, 8, 87–102.
Fulton, T., Kasif, S., and Salzberg, S. (1995). Efficient Algorithms for Finding Multi-Way Splits for Decision Trees. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 244–251). Tahoe City, NV: Morgan Kaufmann.
Hickey, R.J. (1996). Noise Modelling and Evaluating Learning from Examples. Artificial Intelligence, 82, 157–179.
Hong, S.J. (1997). Use of Contextual Information for Feature Ranking and Discretization. IEEE Transactions on Knowledge and Data Engineering, 9, 718–730.
Robnik-Šikonja, M. and Kononenko, I. (1997). An Adaptation of Relief for Attribute Estimation in Regression. In Proceedings of the Fourteenth International Conference on Machine Learning (pp. 296–304). Nashville, TN: Morgan Kaufmann.
Lee, C. and Shin, D.-G. (1994). A Context-Sensitive Discretization of Numeric Attributes for Classification Learning. In Proceedings of the Eleventh European Conference on Artificial Intelligence (pp. 428–432). Amsterdam, The Netherlands: John Wiley and Sons.
Lôpez de Màntaras, R. (1991). A Distance-Based Attribute Selection Measure for Decision Tree Induction. Machine Learning, 6, 81–92.
Nguyen, H.S. and Skowron, A. (1997). Boolean Reasoning for Feature Extraction Problems. In Foundations of Intelligent Systems, Proceedings of ISMIS'97 (pp. 117–126). Lecture Notes in Artificial Intelligence, vol. 1325. Charlotte, NC: Springer-Verlag.
Quinlan, J.R. (1986). Induction of Decision Trees. Machine Learning, 1, 81–106.
Rousu, J. (2001). Efficient Range Partitioning in Classification Learning. Ph.D. Thesis, Department of Computer Science, University of Helsinki, Finland.
Wallace, C.S. and Patrick, J.D. (1993). Coding Decision Trees. Machine Learning, 11, 7–22.
Zighed, D.A., Rakotomalala, R., and Feschet, F. (1997). Optimal Multiple Intervals Discretization of Continuous Attributes for Supervised Learning. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (pp. 295–298). Newport Beach, CA: AAAI Press.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Elomaa, T., Rousu, J. Linear-Time Preprocessing in Optimal Numerical Range Partitioning. Journal of Intelligent Information Systems 18, 55–70 (2002). https://doi.org/10.1023/A:1012920624627
Issue Date:
DOI: https://doi.org/10.1023/A:1012920624627