Classification Models Applied to Uncertain Data

  • Yandry QuirozEmail author
  • Willian ZamoraEmail author
  • Alex Santamaria-PhilcoEmail author
  • Elsa VeraEmail author
  • Patricia Quiroz-PalmaEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 895)


In the field of learning models, the quality directly depends on the training data. That is the reason why data preparation is one of the stages in the knowledge extraction process where more time is invested. In fact, the most common scenario consists in a training created under perfect conditions. However, the situation is often entirely different during the model deployment phase, since, in the real world, data usually contain noise, there may be missing or incorrect values, or even be uncertain, in the sense that we do not know their exact value, but have an approximate knowledge of its value. In this paper, we will study how to apply the learning models to uncertain data. Specifically, we will focus on classification problems in which uncertainty is only present in numerical attributes and present a new approach to apply classification learned models. Experimental results show that the accuracy achieved by our methods improve the case of having maximum uncertainty.

Random Forest has a 3.60% control of uncertainty when its maximum value is achieved. Also, there is a higher level of degradation of 5.59% and 9.60% for both Decision Trees and Naive Bayes.


Learning models Classification models Random Forest Decision trees Naive Bayes 


  1. 1.
    An, Y., Sun, S., Wang, S.: Naive Bayes classifiers for music emotion classification based on lyrics. In: 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), pp. 635–638, May 2017Google Scholar
  2. 2.
    Aydilek, I.B., Arslan, A.: A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inf. Sci. 233, 25–35 (2013)CrossRefGoogle Scholar
  3. 3.
    Bordalejo, M.M.: Método de imputación de los valores no observados. una aplicación en el análisis de la importancia de las becas escolares. In: XIX Encuentro de Economía Pública, vol. 19, pp. 24–2985, November 2012Google Scholar
  4. 4.
    Dhevi, A.T.S.: Imputing missing values using inverse distance weighted interpolation for time series data. In: 2014 Sixth International Conference on Advanced Computing (ICoAC), pp. 255–259, December 2014Google Scholar
  5. 5.
    Hofmann, H.: UCI machine learning repository, May 2017.
  6. 6.
    IBM: IBM HR analytics employee attrition and performance, March 2017.
  7. 7.
    López, C.P., González, D.S.: Data Mining. Ra-Ma, Paracuellos de Jarama (2006)Google Scholar
  8. 8.
    Nadali, A., Kakhky, E.N., Nosratabadi, H.E.: Evaluating the success level of data mining projects based on CRISP-DM methodology by a fuzzy expert system. In: 2011 3rd International Conference on Electronics Computer Technology, v. 6, pp. 161–165, April 2011Google Scholar
  9. 9.
    Hernndez Orallo, J., Ramírez, M., Ferri, C.: Introducción a la Minería de Datos. Pearson, London (2004)Google Scholar
  10. 10.
    Peffers, K., Tuunanen, T., Rothenberger, M.A., Chatterjee, S.: A design science research methodology for information systems research. J. Manag. Inf. Syst. 24(3), 45–77 (2007)CrossRefGoogle Scholar
  11. 11.
    Pratama, I., Permanasari, A.E., Ardiyanto, I., Indrayani, R.: A review of missing values handling methods on time-series data. In: 2016 International Conference on Information Technology Systems and Innovation (ICITSI), pp. 1–6, October 2016Google Scholar
  12. 12.
    R-Foundation: R project. Accessed 01 Nov. 2015
  13. 13.
    Rahman, M.G., Islam, M.Z.: kDMI: a novel method for missing values imputation using two levels of horizontal partitioning in a data set. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) ADMA 2013. LNCS (LNAI), vol. 8347, pp. 250–263. Springer, Heidelberg (2013). Scholar
  14. 14.
    Sharma, R., Garg, P.K., Dwivedi, R.K.: A literature survey for fuzzy based soft classification techniques and uncertainty estimation. In: 2016 International Conference System Modeling Advancement in Research Trends (SMART), pp. 71–75, November 2016Google Scholar
  15. 15.
    Sobrevilla, K.L.M.D., Quiñones, A.G., Lopez, K.V.S., Azaña, V.T.: Daily weather forecast in Tiwi, Albay, Philippines using artificial neural network with missing values imputation. In: 2016 IEEE Region 10 Conference (TENCON), pp. 2981–2985 (2016)Google Scholar
  16. 16.
    Sutton-Charani, N., Destercke, S., Denoeux, T.: Learning decision trees from uncertain data with an evidential EM approach. In: 2013 12th International Conference on Machine Learning and Applications, vol. 1, pp. 111–116, December 2013Google Scholar
  17. 17.
    Swapna, S., Niranjan, P., Srinivas, B., Swapna, R.: Data cleaning for data quality. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 344–348, March 2016Google Scholar
  18. 18.
    UCI: UCI machine learning repository., 05 2017
  19. 19.
    Wu, S.-F., Chang, C.-Y., Lee, S.-J.: Time series forecasting with missing values. In: 2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom), pp. 151–156, March 2015Google Scholar
  20. 20.
    Xu, X., Chen, W.: Implementation and performance optimization of dynamic random forest. In: 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), pp. 283–289, October 2017Google Scholar
  21. 21.
    Hernández Orallo, J., Hervás Martínez, C.: Evaluacion sensible a la distribucion y el coste.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Universidad Laica Eloy Alfaro de ManabiMantaEcuador
  2. 2.Universitat Politecnica de ValenciaValenciaSpain

Personalised recommendations