Advertisement

Abstract

Missing values are existed in several practical data sets. Machine Learning algorithms, such as CN2, require missing values in a data set be pre-processed. The estimated values of a missing value can be provided by Data Imputation methods. However, the data imputation can introduce unexpected information to the data set so that it can reduce the accuracy of Rule Induction algorithms. If missing values can be directly processed in Rule Induction algorithms, the overall performance can be improved. The paper studied the CN2 algorithm to propose a modified version, CN2MV, which is able to directly process missing values without preprocessing. Testing on 17 benchmarking data sets from the UCI Machine Learning Repository, CN2MV outperforms the original algorithm using data imputations.

Keywords

CN2 Missing value Rule induction Data imputation 

References

  1. 1.
    Clark, P., Boswell, R.: Rule induction with CN2: some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 151–163. Springer, Heidelberg (1991).  https://doi.org/10.1007/BFb0017011CrossRefGoogle Scholar
  2. 2.
    Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)Google Scholar
  3. 3.
    Gheyas, I.A., Smith, L.S.: A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 73(16), 3039–3065 (2010)CrossRefGoogle Scholar
  4. 4.
    Grzymala-Busse, J.W., Goodwin, L.K., Grzymala-Busse, W.J., Zheng, X.: Handling missing attribute values in preterm birth data sets. In: Ślęzak, D., Yao, J.T., Peters, J.F., Ziarko, W., Hu, X. (eds.) RSFDGrC 2005. LNCS (LNAI), vol. 3642, pp. 342–351. Springer, Heidelberg (2005).  https://doi.org/10.1007/11548706_36CrossRefGoogle Scholar
  5. 5.
    Honghai, F., Guoshun, C., Cheng, Y., Bingru, Y., Yumei, C.: A SVM regression based approach to filling in missing values. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds.) KES 2005. LNCS (LNAI), vol. 3683, pp. 581–587. Springer, Heidelberg (2005).  https://doi.org/10.1007/11553939_83CrossRefGoogle Scholar
  6. 6.
    Latkowski, R.: High computational complexity of the decision tree induction with many missing attribute values. In: Proceedings of Concurrency, Specification and Programming, CS&P 22, pp. 318–325 (2003)Google Scholar
  7. 7.
    Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)MathSciNetGoogle Scholar
  8. 8.
    Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy k-means clustering method. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 573–579. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-25929-9_70CrossRefGoogle Scholar
  9. 9.
    Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
  10. 10.
    Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Chicester (2002)CrossRefGoogle Scholar
  11. 11.
    Luengo, J., García, S., Herrera, F.: On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowl. Inf. Syst. 32(1), 77–108 (2012)CrossRefGoogle Scholar
  12. 12.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  13. 13.
    Quinlan, J.R.: Unknown attribute values in induction. In: Proceedings of the International Machine Learning Workshop, pp. 164–168 (1989)CrossRefGoogle Scholar
  14. 14.
    Valmarska, A., Lavrač, N., Fürnkranz, J., Robnik-Šikonja, M.: Refinement and selection heuristics in subgroup discovery and classification rule learning. Expert Syst. Appl. 81, 147–162 (2017)CrossRefGoogle Scholar
  15. 15.
    Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann, San Mateo (2016)Google Scholar
  16. 16.
    Wohlrab, L., Fürnkranz, J.: A review and comparison of strategies for handling missing values in separate-and-conquer rule learning. J. Intell. Inf. Syst. 36(1), 73–98 (2011)CrossRefGoogle Scholar

Copyright information

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2019

Authors and Affiliations

  1. 1.HCMC University of Foreign Languages - Information TechnologyHo Chi Minh CityViet Nam

Personalised recommendations