Handling Missing Values for the CN2 Algorithm

Nguyen, Cuong Duc; Tran, Phuong-Tuan; Thai, Thi-Thanh-Thao

doi:10.1007/978-3-030-06152-4_20

Handling Missing Values for the CN2 Algorithm

Conference paper
First Online: 30 December 2018

319 Accesses

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 266))

Abstract

Missing values are existed in several practical data sets. Machine Learning algorithms, such as CN2, require missing values in a data set be pre-processed. The estimated values of a missing value can be provided by Data Imputation methods. However, the data imputation can introduce unexpected information to the data set so that it can reduce the accuracy of Rule Induction algorithms. If missing values can be directly processed in Rule Induction algorithms, the overall performance can be improved. The paper studied the CN2 algorithm to propose a modified version, CN2MV, which is able to directly process missing values without preprocessing. Testing on 17 benchmarking data sets from the UCI Machine Learning Repository, CN2MV outperforms the original algorithm using data imputations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Clark, P., Boswell, R.: Rule induction with CN2: some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 151–163. Springer, Heidelberg (1991). https://doi.org/10.1007/BFb0017011
Chapter Google Scholar
Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)
Google Scholar
Gheyas, I.A., Smith, L.S.: A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 73(16), 3039–3065 (2010)
Article Google Scholar
Grzymala-Busse, J.W., Goodwin, L.K., Grzymala-Busse, W.J., Zheng, X.: Handling missing attribute values in preterm birth data sets. In: Ślęzak, D., Yao, J.T., Peters, J.F., Ziarko, W., Hu, X. (eds.) RSFDGrC 2005. LNCS (LNAI), vol. 3642, pp. 342–351. Springer, Heidelberg (2005). https://doi.org/10.1007/11548706_36
Chapter Google Scholar
Honghai, F., Guoshun, C., Cheng, Y., Bingru, Y., Yumei, C.: A SVM regression based approach to filling in missing values. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds.) KES 2005. LNCS (LNAI), vol. 3683, pp. 581–587. Springer, Heidelberg (2005). https://doi.org/10.1007/11553939_83
Chapter Google Scholar
Latkowski, R.: High computational complexity of the decision tree induction with many missing attribute values. In: Proceedings of Concurrency, Specification and Programming, CS&P 22, pp. 318–325 (2003)
Google Scholar
Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)
MathSciNet Google Scholar
Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy k-means clustering method. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 573–579. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25929-9_70
Chapter Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Chicester (2002)
Book Google Scholar
Luengo, J., García, S., Herrera, F.: On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowl. Inf. Syst. 32(1), 77–108 (2012)
Article Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Quinlan, J.R.: Unknown attribute values in induction. In: Proceedings of the International Machine Learning Workshop, pp. 164–168 (1989)
Chapter Google Scholar
Valmarska, A., Lavrač, N., Fürnkranz, J., Robnik-Šikonja, M.: Refinement and selection heuristics in subgroup discovery and classification rule learning. Expert Syst. Appl. 81, 147–162 (2017)
Article Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann, San Mateo (2016)
Google Scholar
Wohlrab, L., Fürnkranz, J.: A review and comparison of strategies for handling missing values in separate-and-conquer rule learning. J. Intell. Inf. Syst. 36(1), 73–98 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

HCMC University of Foreign Languages - Information Technology, Ho Chi Minh City, Viet Nam
Cuong Duc Nguyen, Phuong-Tuan Tran & Thi-Thanh-Thao Thai

Authors

Cuong Duc Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Phuong-Tuan Tran
View author publications
You can also search for this author in PubMed Google Scholar
Thi-Thanh-Thao Thai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cuong Duc Nguyen .

Editor information

Editors and Affiliations

Nguyen Tat Thanh University, Ho Chi Minh City, Vietnam
Phan Cong Vinh
Concordia University, Montreal, QC, Canada
Vangalur Alagar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, C.D., Tran, PT., Thai, TTT. (2019). Handling Missing Values for the CN2 Algorithm. In: Cong Vinh, P., Alagar, V. (eds) Context-Aware Systems and Applications, and Nature of Computation and Communication. ICCASA ICTCC 2018 2018. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 266. Springer, Cham. https://doi.org/10.1007/978-3-030-06152-4_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-06152-4_20
Published: 30 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06151-7
Online ISBN: 978-3-030-06152-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics