Abstract
Binning (bucketing or discretization) is a commonly used data pre-processing technique for continuous predictive variables in machine learning. There are guidelines for good binning which can be treated as constraints. However, there are also statistics which should be optimized. Therefore, we view the binning problem as a constrained optimization problem. This paper presents a novel supervised binning algorithm for binary classification problems using a genetic algorithm, named GAbin, and demonstrates usage on a well-known dataset. It is inspired by the way that human bins continuous variables. To bin a variable, first, we choose output shapes (e.g., monotonic or best bins in the middle). Second, we define constraints (e.g., minimum samples in each bin). Finally, we try to maximize key statistics to assess the quality of the output bins. The algorithm automates these steps. Results from the algorithm are in the user-desired shapes and satisfy the constraints. The experimental results reveal that the proposed GAbin provides competitive results when compared to other binning algorithms. Moreover, GAbin maximizes information value and can satisfy user-desired constraints such as monotonicity or output shape controls.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Change history
28 October 2019
In the originally published version of the paper on p. 158, the name of the Author was incorrect. The name of the Author has been corrected as “Pramote Kuacharoen”.
In the originally published version of the paper on p. 357, the affiliation of the Author was incorrect. The affiliation has been corrected as “Universidad Distrital Francisco Jose de Caldas, Bogota, Colombia”.
In the originally published version of the paper on p. 373, the affiliation of the Author was incorrect. The affiliation has been corrected as “Universidad Distrital Francisco Jose de Caldas, Bogota, Colombia”.
References
Siddiqi, N.: Credit Risk Scorecards, pp. 79–82. Wiley, Hoboken (2013)
Thomas, L., Edelman, D., Crook, J.: Credit scoring and its applications, pp. 131–139. SIAM, Society for industrial and applied mathematics, Philadelphia (2002)
Refaat, M.: Credit Risk Scorecards: Development and Implementation Using SAS. Lulu.com, Raleigh (2011)
Kerber, R.: ChiMerge: discretization of numeric attributes. In: The Tenth National Conference on Artificial Intelligence, San Jose, California (1992)
Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: IJCAI (1993)
Jopia, H.: Scoring Modeling and Optimal Binning. (2019). https://cran.r-project.org/web/packages/smbinning/smbinning.pdf. Accessed April 2019
Kurgan, L., Cios, K.: CAIM discretization algorithm. IEEE Trans. Knowl. Data Eng. 16(2), 145–153 (2004)
Tsai, C., Lee, C., Yang, W.: A discretization algorithm based on class-attribute. Inf. Sci. 178(3), 714–731 (2008)
Gonzalez-Abril, L., Cuberos, F., Velasco, F., Ortega, J.: Ameva: an autonomous discretization algorithm. Expert Syst. Appl. 36(3), 5327–5332 (2009)
Mironchyk, P., Tchistiakov, V.: Monotone optimal binning algorithm for credit risk modeling. Researchgate (2017). https://www.researchgate.net/publication/322520135_Monotone_optimal_binning_algorithm_for_credit_risk_modeling. Accessed April 2019
FICO: Home Equity Line of Credit (HELOC) Dataset. FICO. https://community.fico.com/s/explainable-machine-learning-challenge?tabset-3158a=2. Accessed April 2019
Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn, pp. 126–129. Prentice Hall, Upper Saddle River (2010)
Coello, C.A.C.: Constraint-handling Techniques used with evolutionary algorithms. In: The Genetic and Evolutionary Computation Conference Companion, Kyoto, Japan (2018)
Acknowledgments
This research was partially supported by Taskworld Inc.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Vejkanchana, N., Kuacharoen, P. (2019). Continuous Variable Binning Algorithm to Maximize Information Value Using Genetic Algorithm. In: Florez, H., Leon, M., Diaz-Nafria, J., Belli, S. (eds) Applied Informatics. ICAI 2019. Communications in Computer and Information Science, vol 1051. Springer, Cham. https://doi.org/10.1007/978-3-030-32475-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-32475-9_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32474-2
Online ISBN: 978-3-030-32475-9
eBook Packages: Computer ScienceComputer Science (R0)