Abstract
The main objective of discretization is to transform numerical attributes into discrete ones. The intention is to provide the possibility to use some learning algorithms which require discrete data as input and to help the experts to understand the data more easily. Due to the fact that in classification problems there are high interactions among multiple attributes, we propose the use of evolutionary algorithms to select a subset of cut points for multivariate discretization based on a wrapper fitness function. The algorithm proposed has been compared with the best state-of-the-art discretizers with two decision trees-based classifiers: C4.5 and PUBLIC. The results reported indicate that our proposal outperforms the rest of the discretizers in terms of accuracy and requiring a lower number of intervals.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
They are specified in Table 1.
References
Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 13(3), 307–318 (2009)
Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Cios, K.J., Pedrycz, W., Swiniarski, R.W., Kurgan, L.A.: Data Mining: A Knowledge Discovery Approach. Springer, New York (2007)
Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intell. 151(1–2), 155–176 (2003)
Elomaa, T., Rousu, J.: General and efficient multisplitting of numerical attributes. Mach. Learn. 36, 201–244 (1999)
Eshelman, L.J.: The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. In: FOGA, pp. 265–283 (1990)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1022–1029 (1993)
Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, New York (2002)
García, S., Luengo, J., Sáez, J.A., López, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)
García, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer, New York (2015)
He, Z., Tian, S., Huang, H.: EMVD-BDC: an evolutionary multivariate discretization approach for association rules. J. Comput. Inf. Syst. 2(4), 1343–1350 (2006)
Kerber, R.: ChiMerge: discretization of numeric attributes. In: National Conference on Artificial Intelligence American Association for Artificial Intelligence (AAAI92), pp. 123–128 (1992)
Kurgan, L.A., Cios, K.J.: CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16(2), 145–153 (2004)
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Min. Knowl. Discov. 6(4), 393–423 (2002)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Mateo (1993)
Sheng, W., Liu, X., Fairhurst, M.C.: A niching memetic algorithm for simultaneous clustering and feature selection. IEEE Trans. Knowl. Data Eng. 20(7), 868–879 (2008)
Tay, F.E.H., Shen, L.: A modified Chi2 algorithm for discretization. IEEE Trans. Knowl. Data Eng. 14, 666–670 (2002)
Wu, X., Kumar, V. (eds.): The Top Ten Algorithms in Data Mining. Chapman & Hall/CRC Data Mining and Knowledge Discovery, Boca Raton (2009)
Yang, Y., Webb, G.I.: Discretization for Naive-Bayes learning: managing discretization bias and variance. Mach. Learn. 74(1), 39–74 (2009)
Zighed, D.A., Rabaséda, S., Rakotomalala, R.: FUSINTER: a method for discretization of continuous attributes. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 6, 307–326 (1998)
Acknowledgments
This work was partially supported by the Spanish Ministry of Science and Technology under project TIN2011-28488 and the Andalusian Research Plans P11-TIC-7765, P10-TIC-6858.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ramírez-Gallego, S., García, S., Benítez, J.M., Herrera, F. (2016). A Wrapper Evolutionary Approach for Supervised Multivariate Discretization: A Case Study on Decision Trees. In: Burduk, R., Jackowski, K., Kurzyński, M., Woźniak, M., Żołnierek, A. (eds) Proceedings of the 9th International Conference on Computer Recognition Systems CORES 2015. Advances in Intelligent Systems and Computing, vol 403. Springer, Cham. https://doi.org/10.1007/978-3-319-26227-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-26227-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26225-3
Online ISBN: 978-3-319-26227-7
eBook Packages: EngineeringEngineering (R0)